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Since  the  summer  of  1992,  the  Soar/DFOR  research  group  has  been  building  intelligent  automated  agents 
for  tJ^cal  air  simulation.  The  Soar/IFOR  research  project  exists  at  three  sites,  the  University  of 
Michig^,  the  University  of  Southern  California,  and  Carnegie  Mellon  University.  The  nltimafft  goal  of 
this  project  is  to  develop  automated  pilots  whose  behavior  in  simulated  engagements  is  indistinguishable 
from  that  of  human  pilots.  Our  work  has  concentrated  on  developing  agents  for  a  variety  of  air-to-air 
and  air-to-ground  missions. 

This  technic^  report  is  a  collection  of  the  research  papers  that  have  been  generated  from  diis  project 
between  Spring  1994  and  Spring  1995.  Earlier  papers  were  published  as  “Collected  papers  of  the 
Soar/IFOR  project.  Spring  1994”,  Johnson,  W.  L,  et  al..  Technical  Reports  CSE-TR-207-94  from  the 
Department  of  Electrical  Btgineering  and  Computer  Science,  Univ«sity  of  Michigan;  ISI/SR-94-379 
from  the  University  of  Southern  California  Information  Sciences  Institute;  and  CMU-CS-94-134  from 
Carnegie  Mellon  University.  The  best  overview  of  this  project  was  published  separately  as 
“Intelligent  Agents  for  Interactive  Simulation  Environments”,  by  Taiiibe,  M.,  Johnson,  W.  L.,  Jones,  R. 
M.,  Koss,  F.,  Laird,  J.  E.,  Rosenbloom,  P.  S.  and  Schwamb,  K.,  in  AI  Magaane,  16(1),  1995. 

The  ^earch  covered  in  these  papers  spans  a  wide  spectrum  of  issues  in  agent  development  such  as 
learning  [2],  planning  [3],  coordination,  and  command  and  control  [4,6],  natural  language  processing 
[5],  agent  tracking  [7, 8, 9]  and  piloting  rotary  wing  aircraft  [10]. 

The  papers  are  organized  by  having  the  overview  paper  first  followed  by  all  of  the  other  papers  in 
alphabetic  order  by  author. 

1.  Laird,  J.  E.,  Johnson,  W.  L.,  Jones,  R.  M.,  Koss,  F.,  Lehman,  J.  F.,  Nielsen,  P.  E.,  Rosenbloom,  P.  S., 
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4  Laird,  J.  E.,  Jones,  R.  M.  and  Nielsen,  P.  E. 

Multiagent  Coordination  in  Distributed  Interactive  Battlefield  Simulations,  Abstract  published  in 
Proceedings  of  the  International  Conference  on  Multi-agent  systems  (ICMAS).  June,  1995, 

5.  Lehman,  J.  F,,  Van  Dyke,  J.,  and  Rubinoff,  R. 

Natural  I^guage  Processing  for  IFORs:  Comprehension  and  Generation  in  the  Air  Combat  Domain. 
Proceedings  of  the  Fifth  Conference  on  Computer  Generated  Forces  and  Behavioral  Representation. 
Orlando,  FL.  pp.  115-123,  May  1995. 


6.  Nielsen,  P. 

Intelligent  Computer  Generated  Forces  for  Command  and  Control.  Proceedings  of  the  Fifth 
Conference  on  Computer  Generated  Forces  and  Behavioral  Representation.  Orlando,  FL.  pp.  211- 
218,  May  1995. 

7.  Tambe,  M. 

Recursive  Agent  and  Agent-group  Tracking  in  a  Real-time  Dynamic  Environment.  In  Proceedings  of 
the  International  Conference  on  Multi-agent  systems  (ICMAS).  June,  1995. 

8.  Tambe,  M.  and  Rosenbloom,  P.  S. 

RESC:  An  Approach  for  Real-time,  Dynamic  Agent  Tracking.  In  Proceedings  of  the  International 
Joint  Conference  on  Artificial  Intelligence  (UCAl),  August,  1995. 

9.  Tambe  M.,  Schwamb,  K.,  and  Rosenbloom,  P.  S. 

Building  Intelligent  Pilots  for  Simulated  Rotary  Wing  aircraft.  In  Proceedings  of  the  Fifth 
Conference  on  Computer  Generated  Forces  and  Behavioral  Representation,  pp.  39-44,  Nfay,  1995. 

This  technical  report  is  being  published  concurrently  at  the  University  of  Michigan  (CSE-TR-242-95), 
the  University  of  Southern  California  Information  Sciences  Institue  (ISFSR-95-406),  and  Carnegie 
MeUon  University  (CMU-CS-95-165). 

This  research  was  supported  under  contract  N00014-92-K-2015  from  the  Advanced  Systems 
Technology  Office  of  the  Advanced  Research  Projects  Agency  and  the  Naval  Research  Laboratory,  and 
contract  N66001-95-C-6013  fiom  the  Advanced  Systems  Technology  Office  of  the  Advanced  Research 
Projects  Agency  and  the  Naval  Command  and  Ocean  Surveillance  Center,  RDT&E  division. 


Simulated  Intelligent  Forces  For  Air: 
The  Soar /IFOR  Project  1995 


John  E,  Laird/  W.  Lewis  Johnson/  Randolph  M.  Jones/  Rrank  Koss/  JiU  F,  Lehman/ 
Paul  E.  Nielsen/  Paul  S.  Rosenbloom/  Robert  Rubinoff/  Karl  Schwamb/ 

Milind  Tambe/  Julie  Van  Dyke/  Michael  van  Lent/  and  Robert  E.  Wray,  ni^ 


^Artificial  Intelligence  Laboratory 
University  of  Michigan 


1101  Beal  Ave, 

Ann  Arbor,  MI  48109-2110 
laird@uinich.edu 


^Information  Sciences  Institute 
University  of  Southern  California 
4676  Admiralty  Way 
Marina  del  Rey,  CA  90292 
rosenbloom@isi.edu 


^Computer  Science  Department 
Carnegie  Mellon  University 
Pittsburgh,  PA  15213 
jef@cs.cmu.edu 


1  Abstract 

For  the  last  three  years,  the  Soar/IFOR 
group  has  been  developing  intelligent  forces  for 
distributed  interactive  simulation  environments. 
Since  early  1994,  our  efforts  have  been  focused  on 
developing  computer  generated  forces  for  air  mis¬ 
sions  including  both  fixed  wing  and  rotary  wing 
aircraft.  This  paper  reviews  the  current  state  of 
the  Soar/IFOR  project  and  discusses  the  results 
of  a  preliminary  trial  of  our  agents  in  STOW-E, 
a  precursor  to  STOW-97. 

2  TntrnHuction 

The  goal  of  the  Soar/IFOR  project  is  to  de¬ 
velop  human-like  synthetic  agents  for  populating 
interactive  distributed  simulation  environments. 
In  contrast  to  the  standard  semi-automated  forces 
(SAF)  approadi,  where  it  is  assumed  that  some 
higher-level  authority,  such  as  a  human  or  a  com¬ 
puterized  command  force  (CFOR),  will  be  re¬ 
sponsible  for  all  dedsions  requiring  judgement, 
our  approadi  is  to  endow  all  entities  with  knowl¬ 
edge  and  decision  making  abilities  similar  to  those 
found  in  humans  performing  similar  tasks.  Our 
hypothesis,  confirmed  in  part  by  our  participation 
in  a  large  scale  simulated  exercise  called  STOW- 
E,  is  that  building  intelligent  forces  provides  a 
payoff  in  terms  of  increasing  the  fidelity  of  the 
^ents^  behavior,  while  decreasing  the  complex¬ 
ity  of  commanding  the  agents. 

Fkom  1992  through  early  1994,  our  efforts  were 
focussed  on  research  and  development  for  be¬ 


yond  visual  range  air-to-air  combat  leading  to 
the  creation  of  TacAir-Soar  [Jones  et  a/.,  1993; 
Rosenbloom  et  oZ.,  1994;  Ta^e  et  1995a]. 
In  early  1994,  we  broadened  our  horizon  sigmf- 
icantly,  and  we  are  now  working  on  developing 
automated  synthetic  pilots  for  the  majority  of  air 
missions  fiown  in  the  U.S.  military.  The  var¬ 
ious  misdons  include  air-to-air  (d^ensive  com¬ 
bat  air  patrols,  sweeps),  air-to-groimd  (close  air 
support,  interaction,  strategic  attack),  air-to- 
surfEtce,  rotary  wing  (anti-armor),  as  wdl  as  some 
support  missions  (refueling,  resupply,  etc.).  We 
are  also  developing  additional  agents,  such  as  air 
and  ground  controllers,  that  communicate  with 
the  agents  flying  in  planes  and  heficopters  during 
their  missions.  We  will  refer  to  all  the  agents 
bdng  developed  by  the  Soar/IFOR  project  as 
Air-IFOR  agents,  while  TacAir-Soar  refers  to  the 
agents  that  fly  tactical  fixed  wing  aircraft. 

During  the  last  year,  we  have  made  progress 
on  many  of  these  missions,  and  in  this  paper  we 
will  review  all  aspects  of  the  existing  Soar/IFOR 
agents,  including:  the  interaction  between  Air- 
IFOR  agents  and  DIS,  the  design  of  Air-IFOR 
agents,  their  capabilities,  the  interactions  be¬ 
tween  multiple  Air-IFOR  agents,  and  the  partic¬ 
ipation  of  Air-IFOR  agents  in  STOW-E. 

3  Interaction  with  DIS 

Since  the  inception  of  the  Soar/IFOR  project, 
our  goal  has  been  to  create  an  abstract  interface 
layer  between  Air-IFOR  agents  and  the  underly¬ 
ing  simulation  (DIS)  environment.  We  call  this 


the  ‘Sdrtual  cockpit”  abstraction,  meaning  that 
Air-IFOR  agents  should  have  an  interface  that 
supports  the  types  of  interactions  a  pilot  has  in 
the  cockpit  of  a  plane  or  helicopter  [Schwamb  et 
a/.,  199^.  Thus,  Air-IFOR  agents  are  isolated 
from  the  detsdls  of  the  underlying  simulation  envi¬ 
ronment,  network  protocol,  plane  dynamics,  sen¬ 
sor  simulation,  etc.  Currently,  we  use  Mo^AF 
[Calder  et  1993]  as  the  underlying  software 
which  provides  connectivity  to  the  DIS  environ¬ 
ment  as  well  as  simulations  of  the  vehicle  dynam¬ 
ics,  sensors,  weapons,  and  communication  (radio) 
systems.  To  support  the  virtual  cockpit,  we  have 
added  C  code,  which  defines  a  Soar/ModSAF  In¬ 
terface  (SMI)[Schwamb  et  al.y  1994].  The  SMI 
makes  all  of  the  appropriate  calls  to  the  underly¬ 
ing  ModSAF  functions  so  that  Air-IFOR  agents 
get  access  to  the  appropriate  sensor  and  weapons 
systems.  The  SMI  does  not  use  ModSAF  tasks 
or  taskframes,  but  instead  relies  on  lower  level 
functions  whidi  ^ves  Air-IFOR  agents  finer-grain 
control  of  their  own  behavior. 

Air-IFOR  agents  are  built  within  the  Soar  ar¬ 
chitecture  [Laird  et  1987;  Laird  and  Rosen- 
bloom,  1994;  Rosenbloom  et  al.^  1991;  Rosen- 
bloom  et  1993].  Soar,  the  SMI,  and  Mod¬ 
SAF  are  integrated  (within  the  same  Unix  pro¬ 
cess)  so  that  each  Soar/EPOR  agent  gets  kicked” 
during  the  simulation  cycle.  Using  this  arrange¬ 
ment,  we  can  run  multiple,  independent  agents 
on  a  single  Unix  workstation,  as  well  as  having 
agents  on  many  different  machines  —  although  a 
single  agent  is  not  distributed  across  multiple  ma¬ 
chines.  Air-IFOR  agents  do  not  share  data  exc^t 
through  explicit  communication  using  Emulated 
radios. 

As  part  of  building  the  SMI,  we  have  ex¬ 
tended  the  standard  suite  of  ModSAF  sensors  and 
weapons,  adding  such  devices  as  a  CCIP  (con¬ 
tinuously  computed  impact  point)  which  displays 
where  a  bomb  will  hit  if  released,  a  waypoint  com¬ 
puter  which  displays  the  appropriate  heading  to 
fly  to  the  next  waypoint  in  a  flight  plan,  air-to- 
surface  missiles  (such  as  the  Exocet),  and  a  prim¬ 
itive  form  of  precision-guided  mimitions. 

One  result  of  our  development  has  been  the 
recognition  that  the  cdoser  we  model  the  types 
of  information  available  to  humans,  not  at  the 
level  of  visual  perception,  but  instead  at  the  level 
of  symbolic  data,  the  easier  it  is  to  model  the 
behavior  of  the  humans.  For  example,  we  discx>v- 
ered  that  creating  a  waypoint  computer  and  the 
CCIP  greatly  reduced  the  reasoning  recjuired  by 
the  Soar  agents  because  they  no  longer  had  to  re¬ 
spond  to  every  cdiange  in  their  position  relative  to 
a  waypoint  or  target.  Instead  they  could  respond 
to  the  changes  in  the  heading  suggested  by  the 
waypoint  computer  or  CCIP. 

A  problem  we  foresee  in  the  future  is  the  man¬ 


agement  of  many  Soar/IFOR  agents  during  a  pro¬ 
tracted  exercise.  The  problem  is  not  in  terms  of 
command  and  control  (covered  in  the  secrtion  on 
multiagent  interactions),  but  is  in  terms  of  man¬ 
aging  the  creation,  reuse,  and  destru<i;ion  of  Air- 
IFOR  agents  on  many  different  workstations.  To 
this  end,  as  well  as  to  support  cleaner  interfaces 
to  Soar  agents,  we  have  integrated  Soar  with  Tcl 
[Ousterhout,  1994],  a  scripting  language,  that  will 
help  support  agent  management  across  many  ma¬ 
chines. 

4  Agent  Design 

The  overall  design  of  Air-IFOR  agents  has  not 
changed  significantly  over  the  last  year,  although 
it  has  been  refined  and  augmented  with  new  tools. 
Nor  have  the  basic  requirements  of  Air-IFOR 
agents  changed.  They  continue  to  be  the  follow¬ 
ing: 

1.  Encode  large  bodies  of  knowledge  about  rel¬ 
evant  aspects  of  the  world,  including  tactics, 
docrtrine,  sensors,  weapons,  etc^ 

2.  React  cpoickly  to  the  environment,  such  as 
the  behavior  of  enemy  planes,  conununicntions 
from  other  friendly  agents,  and  changes  in  ter¬ 
rain  bring  traversed. 

3.  Determine  the  tactically  relevant  features  of  a 
cx>mplex,  dynamic  environment. 

4.  Coordinate  behavior  with  other  agents. 

5.  Use  minimal  c^omputational  resources. 

6.  Deliberately  plan  aspecrts  of  missions  not  spec¬ 
ified  in  orders, 

4*1  Method  and  Approach 

All  of  the  Soar/IFOR  agents  are  developed  within 
the  Soar  architecrture.  Soar  has  its  roots  in  early 
AI  symbolic  systems  such  as  LT  [Newell  and  Si¬ 
mon,  1956],  and  GPS  [Ernst  and  Newell,  1969],  as 
well  as  rule-based  systems,  such  as  OPS5  [Forgy, 
1982].  Soar  supports  the  above  requirements  by 
providing  two  integrated  levels  of  cx)mputation: 
deliberate,  sequential  operators  within  problem 
spaces,  and  automatic  parallri  rules.  In  terms 
of  the  tasks  that  have  to  be  performed  by  Air- 
IFOR  agents,  it  is  easiest  to.  think  in  terms  of 
the  first  level,  operators.  We  make  the  cdaim  that 
secjuences  of  deliberate  operators  are  the  most  ap¬ 
propriate  way  to  model  the  second  to  second  be¬ 
havior  of  a  pilot  (or  any  human  for  that  matter). 
Example  operators  incdude  flying  a  mission,  pick¬ 
ing  a  control  point  to  fly  to,  intercepting  a  ban¬ 
dit,  entering  a  waypoint  into  the  plane’s  waypoint 
computer,  deriding  which  missile  to  fire,  physi¬ 
cally  selecrting  that  missile,  pushing  the  &e  but¬ 
ton,  and  so  on.  Some  of  these  are  purely  mental 
operators,  such  as  deriding  which  missile  to  se- 
lecrt,  while  others  inriude  physical  actions.  Many 


of  these  operators  cannot  be  performed  directly  as 
a  single  act,  but  instead  must  be  decomposed  into 
subgoals  where  finer-grain  operators  are  selected 
and  applied.  For  example,  the  act  of  intercepting 
a  bandit  is  decomposed  into  many  different  op- 
erators,  such  as  achieving  proximity,  employing 
weapons,  and  so  forth. 

Thus,  Soar  organizes  the  doctrine  and  tactics  of 
flying  missions  in  planes  and  helicopters  in  terms 
of  hierarchies  of  operators.  For  a  given  opera¬ 
tor  that  the  agent  is  trying  to  pursue,  such  as 
an  intercept,  the  operators  used  to  adiieve  it  are 
grouped  in  terms  of  problem  spaces.  They  are 
called  problem  spaces  because  their  constituent 
operators  determine  the  space  of  problems  that 
can  be  solved.  Operators  can  be  shared  among 
more  than  one  problem  space.  For  example  set¬ 
ting  the  waypoint  computer  is  used  in  flying 
routes,  as  well  as  flying  BARCAPs.  Other,  so- 
called,  floating  opemtors,  are  available  in  every 
active  problem  space.  Floating  operators  such  as 
operators  that  detect  changes  in  a  bogey’s  activ¬ 
ity,  are  very  sensitive  to  dianges  to  the  environ¬ 
ment  and  usually  need  to  be  selected  soon  after 
they  become  relevant.  More  generally,  the  hierar¬ 
chical  and  floating  operators  can  be  seen  as  at  o|>- 
posite  ends  of  two  dimen?dons:  sensitivity  to  the 
agent’s  current  goals,  and  sensitivity  to  the  cur¬ 
rent  situation.  All  operators  must  be  sensitive  to 
both  concerns,  but  floating  operators  emphasize 
reacting  to  the  current  situation  (within  the  con¬ 
text  of  the  current  goals),  while  hierarchical  oper¬ 
ators  emphasize  responding  to  the  current  goals 
(within  the  context  of  the  current  situation). 

Within  a  subgoal,  local  atuational  information 
is  held  in  the  subgoal’s  state.  Each  subgoal  ha-s 
access  to  all  of  the  state  information  in  its  super- 
goals,  and  the  state  of  the  top  goal  contains  all 
the  data  used  to  fly  a  mission,  induding  all  sensor 
data,  the  agent’s  int^retation  of  the  current  sit¬ 
uation,  a  description  of  the  current  mission,  data 
on  other  agents,  etc. 


The  hierarchical  operator  structure  provides 
the  necessary  framework  for  encoding  knowledge 
and  organizing  the  behavior  of  Air-IFOR  agents; 
however,  it  alone  is  instiffident  to  provide  flexibil¬ 
ity  and  reactivity.  What  is  needed  is  the  ability 
to  dynanucally  propose,  sdect,  and  apply  the  op¬ 
erators  that  are  appropriate  for  the  current  situa¬ 
tion.  This  is  done  in  Soar  through  its  underlying 
rule-base  system,  which  directly  implements  the 
selection,  application,  and  termination  of  opera¬ 
tors  described  above.  Thus,  there  are  rules  which 
test  the  current  situation  and  propose  operators, 
rules  which  compare  proposed  operators  and  sug¬ 
gest  preferences  between  operators,  rules  which 
test  that  an  operator  has  been  selected  and  then 
performs  some  aspect  of  the  operator,  and  rules 
that  test  that  all  aspect  of  an  operator  have  been 


completed,  and  signal  that  the  operator  is  fin¬ 
ished.  The  actual  selection  of  operators  is  not 
done  directly  by  individual  rules,  but  by  a  deci¬ 
sion  procedure^  which  selects  an  operator  based 
on  all  relevant  preferences. 

Most  rule-based  systems  use  a  conflict- 
resolution  sdieme  to  sdect  a  single  rule  to  fire 
on  each  cyde.  However,  rules  from  these  systems 
map  more  directly  onto  Soar’s  operators,  which 
are  the  locus  of  deliberate  activity  in  Soar,  and 
where  sdection  is  controlled  by  preferences  and 
the  decision  procedure.  Soar’s  rules  are  more  like 
an  assodative  memory,  where  the  information  in 
actions  of  rules  is  recalled  whenever  the  condi¬ 
tions  of  the  rules  matdi.  Thus  to  retrieve  all  infor¬ 
mation  rdevant  to  the  current  situation,  the  basic 
cyde  is  to  fire  all  rules  that  match  the  current  sit¬ 
uation,  and  continue  firing  until  quiescence.  Dur¬ 
ing  this  rule  firing  phase,  rules  to  implement  the 
current  operator  are  firing,  as  well  as  rules  propos¬ 
ing  new  operators.  At  quiescence,  assuming  the 
current  operator  is  finished,  a  decision  is  made 
to  sdect  a  new  operator  based  on  the  available 
preferences,  and  the  <yde  begins  again.  If  the 
current  operator  cannot  be  finished,  posribly  be¬ 
cause  it  requires  problem  solving  in  a  subgoal,  a 
subgoal  will  be  created  automatically,  and  then 
rules  sensitive  to  the  subgoal  will  fire  to  suggest 
appropriate  operators.  When  a  rule  detects  that 
the  original  operator  is  finally  complete  (or  should 
be  abandoned),  it  will  fire  and  cause  a  new  op¬ 
erator  to  be  sdected  and  the  immediate  subgoal 
(and  any  additional  subgoals)  will  be  automati¬ 
cally  removed.  Soar  is  integrated  with  ModSAF 
so  that  one  decision  is  made  for  each  agent  during 
eadi  dock  tick  of  the  simulation,  and  thus  2  to  15 
dedsions  are  made  in  eadi  Air-IFOR  agent  eadi 
second. 

4.2  Infrastructure 

In  maintaining  a  rule-based  S3rstem,  the  rules 
must  be  orgaiuzed  so  that  it  is  easy  to  find  rules, 
not  only  by  thdr  name,  but  also  by  thdr  role  in 
produdng  behavior.  For  the  Soar/IFOR  agents, 
we  have  mapped  the  hierarchical  structure  of  the 
operators  onto  the  hierardiical  structure  of  the 
Unix  file  ^stem.  Thus,  each  goal  (or  subgoal) 
has  its  own  directory,  and  within  that  directory 
there  are  files  for  eadi  of  the  operators,  plus  a 
file  for  loading  in  those  operator  files.  For  cases 
where  rules  are  not  shared  across  agents,  we  have 
a  dynamic  load  facnlity  that  loads  only  the  subset 
of  the  code  that  is  relevant  to  the  current  agent’s 
vehicle  and  mission. 

Our  lowest-level  documentation  of  the  problem 
space,  operators,  and  rules  is  also  organized  in  the 
same  hierarchical  file  structure  with  direct  links 
from  the  documentation  to  the  code  [Koss  and 
Lehman,  1994].  A  higher  level  of  documentation, 
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using  the  terminology  and  structure  of  our  do¬ 
main  experts,  links  into  the  problem  space  docu¬ 
mentation  to  currently  support  a  limited  form  of 
validation.  All  of  our  documentation  is  in  HTML 
and  it  can  be  accessed  through  viewers  such  as 
Mosaic  and  Netscape. 

To  support  the  creation  of  the  code  and  doc¬ 
umentation  with  our  conventions,  we  have  cre¬ 
ated  the  Soar  Development  Environment  (SDE) 
[Hucka  and  Laird,  1995],  which  is  an  extension  to 
Emacs.  SDE  has  a  template  language  that  can 
be  used  to  automatically  generate  all  of  the  nec¬ 
essary  directories,  code,  and  documentation  files 
when  new  operators  are  created.  SDE  also  pro¬ 
vides  many  features  to  aid  in  debug^ng,  such 
as  automatic  finding  of  files  in  which  rules  are 
stored,  point  and  click  commands  for  common 
functions,  and  general  search  facilities  for  the 
rules. 

4.3  Current  Status  and  Lessons  Learned 

The  current  Air-IFOR  agents  have  a  combined 
total  of  approximately  320  operators,  with  a  total 
of  3,100  rules.  Individual  agents  have  between 
1,130  and  2,550  rules  depending  on  their  missions. 
These  cotmts  do  not  include  our  natural  language 
or  debriefing  systems,  whidi  by  themselves  have 
substantial  nua^ers  of  rules. 

One  of  the  challenges  in  building  the  agents  has 
been  to  maintain  the  computational  efficiency  of 
the  i^stem  as  we  add  new  capabilities.  The  prob¬ 
lem  is  not  that  Soar  slows  down  as  the  sheer  num¬ 
ber  of  rules  increase  (research  indicates  that  Air- 
IFOR  agents  may  be  able  to  grow  to  even  a  mil¬ 
lion  rules  without  this  bdng  an  issue  [Doorenbos, 
1994]),  but  instead  the  problem  is  that  it  is  easy 
to  write  rules  that  fire  every  time  some  input  data 
changes  (sudi  as  when  the  current  position  of  the 
plane  changes).  As  a  result,  we  closely  monitor 
rule  firings  in  order  to  identify  costly  rules,  and 
then  attempt  to  rewrite  them  in  order  to  decrease 
their  cost.  In  a  few  cases,  we  have  discovered 
that  by  removing  a  computation  from  Soar  that 
is  done  in  the  cockpit  for  a  pilot,  sudi  as  with  the 
waypoint  computer  and  the  CCIP,  we  have  been 
able  to  drastically  reduce  the  computational  over¬ 
head  in  Soar. 

During  agent  development,  we  are  able  to  run 
6-10  agents  on  a  single  150MHz  4400  SGI  Indy. 
However,  one  of  the  lessons  we  learned  from 
STOW-E  is  that  we  are  limited  to  aroimd  4 
agents  when  there  are  large  numbers  of  entities 
on  the  network.  This  is  because  of  overhead  in 
both  ModSAF  and  the  Soar  agents  that  results 
from  the  processing  of  large  numbers  of  entities. 
Ixt  response,  we  expect  to  put  more  emphasis 
on  focusing  attention  on  only  the  most  impor¬ 
tant  entities  at  all  levels  of  processing,  as  well 
a^  to  continue  research  on  efficient  matching  of 


rule-based  systems  [Acharya  and  Tambe,  1993; 
Kim  and  Rosenbloom,  1993], 

5  Agent  Capabilities 

Although  Soar  provides  the  basic  architecture 
for  building  Air-IFOR  agents,  our  agents  are  more 
than  a  large  collection  of  rules  that  directly  en¬ 
code  doctrine  and  tactics.  They  must  also  have 
a  many  cognitive  capabilities,  some  of  which  are 
directly  related  to  military  flying  such  as  follow¬ 
ing  a  flight  plan,  situational  awareness,  planning 
attacks,  employing  weapons,  and  managing  fuel, 
while  others  are  more  general  cognitive  capabil¬ 
ities,  such  as  commimicating  with  other  agents, 
modeling  the  behavior  of  other  agents,  being  able 
to  explain  the  agent’s  behavior,  and  using  general 
problem  solving  strategies. 

To  date,  we  have  discovered  that  although 
these  general  cognitive  capabilities  are  impor¬ 
tant,  we  have  been  able  to  build  viable  agents 
by  concentrating  on  those  capabilities  directly  re¬ 
lated  to  performing  our  agents’  missions.  Thus, 
we  have  developed  and  incorporated  capabilities 
for  following  flight  plans,  planning  attacks,  em¬ 
ploying  weapons,  situational  awareness,  manag¬ 
ing  fuel,  and  so  on.  All  of  these  are  the  building 
blocks  for  various  missions.  There  are  also  many 
capabilities  dealing  with  coordinating  behavior 
among  multiple  agents,  which  are  discussed  in 
the  section  on  multiagent  interactions.  These  ca¬ 
pabilities  are  all  implemented  as  operators  that 
have  complex  subgoals.  For  example,  following 
a  flight  plan  involves  many  operators  including 
flying  routes  (of  which  there  are  different  types 
depending  on  the  aircraft),  performing  various  ac¬ 
tivities  at  waypoints  (such  as  communicating  with 
control  agents  or  determining  if  a  plane  should  de¬ 
lay  at  the  point  so  that  it  arrives  on  target  at  the 
appropriate  time),  selecting  the  next  route,  and 
processing  any  changes  the  agent  might  receive 
to  its  mission.  We  expect  these  capabilities  to  be 
reused  on  future  missions,  possibly  with  modifi¬ 
cation  as  new  variants  are  required. 

We  expect  that  the  more  general  cognitive  ca¬ 
pabilities  will  become  necessary  as  we  try  to  cre¬ 
ate  agents  which  are  more  autonomous,  and  thus 
able  to  handle  novel  situations  on  thrir  own.  To 
that  end,  we  are  pursuing  research  in  the  follow¬ 
ing  areas: 

1.  Natural  language  processing:  Even  with  the 
advent  of  the  Command  and  Control  Simu¬ 
lation  Interface  Language  (CCSIL)  [Salisbury, 
1995],  we  will  someday  want  Air-IFOR  agents 
to  directly  interact  with  humans.  Air-ITOR 
agents  will  need  to  understand  and  generate 
natural  language,  with  one  of  the  chaUenges 
being  to  integrate  the  processing  of  language 
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with  all  of  the  other  agents’  tasks  [Lehman  et 
al.,  1995]. 

2.  Behavior  explanation:  As  the  complexity  of 
Air-IFOR  agents  grow,  it  is  necessaiy  for  each 
of  them  to  be  able  to  explain  its  own  behav¬ 
ior  and  internal  reasoning.  What  action  did  it 
take,  why  did  it  take  that  action,  why  did  it 
interpret  the  sittiation  in  the  way  it  did,  and 
what  were  other  options?  We  have  been  ac¬ 
tively  pursuing  these  issues  in  the  Debrief  sys¬ 
tem,  whidi  is  a  set  of  Soar  rules  that  when 
included  in  an  agent  before  a  run,  allows  the 
agent  to  be  debriefed  after  flying  a  mission 
[Johnson,  1994]. 

3.  Agent  modeling:  In  order  to  interpret  the  ac¬ 
tions  of  other  agents,  Air-IFOR  agents  must 
have  some  understanding  of  what  the  other 
agents  are  thinking.  This  is  currently  done 
in  very  specialized  and  context  spedfic  ways 
in  Air-IFOR  agents.  However,  as  we  start  to 
explore  complex  behavior,  it  will  be  necessary 
for  Air-IFOR  agents  to  create  general  internal 
models  of  what  other  agents  are  thinking  about 
the  current  situation.  For  example,  deceptive 
maneuvers  involve  generating  behaviors  with 
the  goal  of  leading  an  opponent  to  incorrectly 
guess  what  your  intent  and  action  really  is.  We 
can  currently  encode  “deceptive”  maneuvers  in 
Air-IFOR  agents;  however,  for  the  agent  itself 
to  derive  an  appropriate  deceptive  maneuver 
in  novel  situations  requires  the  ability  to  inter¬ 
nally  model  some  of  the  thought  processes  of 
other  agents,  a  problem  we  are  actively  pursu¬ 
ing  [Tambe  and  Rosenbloom,  1995]. 

4.  General  Problem  Solving  and  Planning:  Our 
current  agents  have  all  the  necessary  control 
knowledge  for  making  the  decisions  we  ex¬ 
pect  them  to  encounter.  However  acquiring 
this  knowledge  is  difficult  and  time-consuming, 
and  this  knowledge  alone  does  not  always 
lead  to  robust  performance  in  novel  situations. 
Over  the  last  year,  we  have  done  researdi  on 
more  general  problem  solving  and  planning 
approaches  that  can  use  more  “fundamental” 
knowledge  of  the  domain  and  thus  increase 
the  ability  of  Air-IFOR  agents  to  respond  to 
novel  situations.  Using  experimental  versions 
of  TacAir-Soar,  we  have  demonstrated  the  fea¬ 
sibility  of  integrating  both  look-ahead  plan¬ 
ning  (van  Lent,  1995]  and  means-ends  analysis 
[Wray,  1995]  into  Air-IFOR  agents. 

In  addition  to  the  more  general  capabilities 
listed  above,  Air-IFOR  agent  must  have  knowl¬ 
edge  that  includes  the  doctrine  and  tactics  appro¬ 
priate  to  the  missions  they  are  to  perform.  Cur¬ 
rently,  Air-IFOR  agents  fly  the  following  fixed- 
wing  missions:  BARCAP,  Close  Air  Support, 
Strategic  Attack,  and  MiGSweep.  For  rotary 


wing,  Air-IFOR  agents  can  fly  a  basic  anti-armor 
mission  [Tambe  et  al,^  1995b].  In  addition,  we 
have  developed  the  following  agents  that  act  as 
controllers  during  missions  [Nielsen,  1995]. 

•  Air  Intercept  Controller  (AlC)  and  Groimd 
Controlled  Intercept  (CGl)  which  give  infor¬ 
mation  and  commands  about  enemy  planes. 
The  AIC  is  situated  in  a  plane  with  a  large 
radar,  such  as  an  E2C. 

•  Forward  Air  Controller  (FAC)  which  pro^ides 
final  directions  for  close-air  support  missions. 

•  Direct  Air  Support  Center  (DASC)  assigns  air¬ 
craft  to  missions,  can  change  the  mission,  and 
hands  off  control  to  the  FAC. 

•  Fire  Support  Coordination  Center  (FSCC)  de¬ 
termines  the  type  of  support  to  utilize  (dose 
air  support,  artillery,  or  naval  gunfire)  and  if 
dose  air  support  is  determined  it  generates  a 
tactical  air  request  form  then  sends  the  request 
to  the  DASC. 

•  Tactical  Air  Command  Center  (TACC)  which 
provides  air  traffic  control,  intermediate  rout¬ 
ing,  and  deconfliction. 

•  Tactical  Air  Direction  (TAD)  controUer  directs 
spedfic  air  operations  within  the  area  of  oper¬ 
ations,  prior  to  the  establishment  of  a  DASC. 

We  have  operational  verdons  of  all  of  these 
agents,  although  xnany  are  limited  to  producing 
behavior  that  is  only  rdevant  to  dose  air  support 
and  air-to-air  misdons. 

6  Multiagent  Interactions 

Although  the  individual  agents  are  by  them- 
sdves  important,  it  is  the  coordination  of  agents 
that  leads  to  effective  military  forces.  Our  ap¬ 
proach  is  to  modd  the  methods  and- -practices 
of  military  organizations.  Air-IFOR  agents  co¬ 
ordinate  their  activities  through  a  combination 
of  common  background  knowledge  (their  knowl¬ 
edge  of  military  methods,  procedures,  doctrine 
and  tactics),  common  misdon  statements,  and 
e3q>lidt  communication  (non-verbal  and  verbal) 
[Laird  et  a/.,  1995].  Because  Air-IFOR  agents 
know  what  they  are  supposed  to  do  and  when 
(because  of  thdr  backgroxmd  knowledge  and  mis¬ 
don  statements),  the  need  for  explicit  commu¬ 
nication  is  greatly  reduced.  Also,  in  contrast  to 
S  AF  agents,  Air-IFOR  agents  are  “smart”  enou^ 
to  deal  with  the  details  of  executing  all  aspects 
of  the  missions  they  have  been  asdgned  and  do 
not  require  constant  monitoring  by  a  human  or 
command  agent.  When  explicit  verbal  commu¬ 
nication  is  used,  we  attempt  to  model  both  the 
content  and  form  used  by  real  pilots.  Thus, 
Air-IFOR  agents  send  simulated  radio  messages 
whose  content  closely  mirrors  the  English  words 


and  phrases  used  by  real  pilots.  The  generation 
and  interpretation  of  these  messages  is  currently 
done  by  a  fixed  set  of  templates  and  not  a  general- 
purpose  natural  language  facility  (although  one  is 
under  development  (Lehman  et  a/.,  199^).  Air- 
IFOR  agents  currently  can  generate  and  interpret 
approximately  100  different  types  of  messages. 

When  flying  as  a  unit,  most  of  the  coordination 
occurs  by  the  wingman  visually  observing  and  re¬ 
sponding  to  the  behavior  of  the  lead  of  the  tinit. 
The  wingman  constantly  adjusts  its  position  to 
stay  in  the  appropriate  formation.  The  wingman 
also  keeps  track  of  the  progress  of  the  unit  in  its 
mission,  observing  the  adiievement  of  waypoints. 
Depending  on  the  mission  details,  the  wingman 
may  change  formation,  break  formation  to  fly  an 
independent  ground  attack,  rejoin  the  formation 
following  an  attack,  or  even  take  over  as  the  lead. 

Currently,  TacAir-Soar  agents  (Air-IFOR 
agents  for  tactical  fixed  wing  ^craft)  are  able 
to  fly  as  dther  sections  (two  planes)  or  divisions 
(four  planes).  Th^  can  fly  a  variety  of  forma¬ 
tions  and  they  can  dynamickUy  break  into  smaller 
units,  such  as  a  division  splitting  into  two  sec¬ 
tions,  and  then  later  reform  as  a  single  unit. 
Within  a  section,  the  lead  and  wingman  can  coor¬ 
dinate  thdr  radars  (covering  different  parts  of  the 
sky  and  communicating  enemy  contacts)  as  well 
as  coordinating  thrir  weapons  employment  dur¬ 
ing  air-to-mr  engagments.  During  air-to-ground 
attacks,  a  section  can  use  a  variety  of  coordinated 
tactics,  which  are  planned  by  the  lead  at  the  be¬ 
ginning  of  the  mission.  Our  work  on  coordina¬ 
tion  with  rotary  wing  units  is  also  under  devel¬ 
opment  where  currently  the  helicopters  can  fly  in 
pairs,  with  the  expected  progresdon  to  platoons 
and  then  companies  during  the  next  year. 

A  unit  of  TacAir-Soar  agents,  su^  as  a  sec¬ 
tion  or  division,  will  also  coordinate  its  behavior 
with  available  controllers  (AIC,  CGI,  FAC,  TAD, 
TACC,  FSCC,  DASC)  [Nielsen,  1995].  The  con¬ 
trollers  can  give  the  unit  flight  information  (such 
as  the  altitude  to  fly  at,  or  the  name  of  the  next 
controller),  permission  to  continue  the  mission 
(permission  to  enter  an  area,  or  permission  to  at¬ 
tack  a  target),  information  on  other  planes,  or 
dianges  to  missions.  In  the  case  of  changing  a 
mission,  a  controller  can  dynamically  change  al¬ 
most  any  aspect  of  a  ground  attack  mission  in¬ 
cluding  the  route,  the  time  on  target,  and  the 
final  target.  When  a  mission  change  is  received, 
the  members  of  the  unit  change  thdr  missions, 
sometimes  replanning  the  final  attack  for  air-to- 
ground  missions. 

Oxir  goal  is  to  continue  to  build  up  the  co¬ 
ordination  of  Air-IFOR  agents  into  integrated 
missions.  We  are  currently  dose  to  complet¬ 
ing  dose-air  support  which  involves  a  variety  of 
controllers  plus  planes  doing  individual  missions. 


However,  missions  such  as  offensive  strike  and 
integrated  interdiction  can  involve  a  variety  of 
different  planes  flying  many  different  individual 
missions  (strategic  attadc,  RECCE,  NflGSweep, 
SEAD,  etc.)  that  have  to  be  dosdy  orchestrated 
to  pull  off  the  complete  mission.  We  plan  on 
working  on  these  missions  and  the  required  co¬ 
ordination  over  the  next  year. 

Our  approach  to  date  has  been  to  support  the 
coordination  of  activities  within  the  set  of  agents 
under  our  direct  control.  We  have  been  able  to 
devdop  our  own  templates  independent  of  other 
groups.  However,  in  the  future  some  Air-IFOR 
agents  will  need  to  communicate  with  other  com¬ 
mand  forces,  and  thus,  we  will  soon  be  using 
CCSIL  protocols  for  communication  between  our 
agents  and  their  controOers. 

7  STOW>E 

During  November  4-7,  1994,  a  large  scale  op¬ 
erational  military  exerdse  called  STOW-E  (Sim¬ 
ulated  Theater  Of  War  -  Europe)  was  held  across 
18  installations  in  United  States  and  Europe.  At 
its  peak,  over  1,800  entities  were  simtdated  on 
the  Defense  Simulation  Internet  (DSI).  Although 
the  vast  majority  of  the  entities  were  involved  in 
ground  actioxis,  there  were  also  a  significant  num¬ 
ber  of  air  missions  bring  flown  using  humans  in 
rimulators,  ModSAF  agents,  Soar/IFOR  agents, 
and  in  a  few  cases,  real  planes  with  instrumenta¬ 
tion  that  allowed  them  to  be  sensed  within  the 
DIS  environment  (although  these  planes  could 
not  sense  the  DIS  entities).  For  the  Soar/IFOR 
group,  this  was  the  first  chance  to  participate 
in  a  realistic,  large  scale  simulation  environment 
where  we  did  not  have  complete  control  over  the 
scenarios. 

Over  the  four  day  period,  the  Soar/IFOR 
agents  were  scheduled  to  participate  in  10  events. 
For  each  event  we  had  spedfic  missions  assigned 
to  Air-IFOR  agents  that  had  been  given  to  us 
weeks  in  advance.  These  missions  induded  defen¬ 
sive  adr  missions  (BARCAPs),  offensive  air  mis¬ 
sions  to  disrupt  BARCAPs,  air  to  ground  mis¬ 
sions,  and  air  to  surface  missions. 

We  successfully  fielded  agents  for  every  event 
in  which  we  were  sdieduled  (10  events,  approxi¬ 
mately  32  agents)  and  partidpated  in  many  un¬ 
scheduled  events  (5-7  events,  approrimately  16 
agents).  TacAir-Soar  performed  air-to-air  mis¬ 
sions  against  ModSAF  and  humans  (in  simula¬ 
tors).  TacAir-Soar  attempted  to  engage  planes 
from  other  sites,  but  because  of  problems  with 
the  network,  the  other  agents  did  not  see  TacAir- 
Soar.  We  also  partidpated  in  air-to-ground 
(bombing  bridges,  etc.)  and  air-to-surface  (fir¬ 
ing  missiles  at  ships)  attacks  in  which  we  engaged 
ground  and  surface  targets  from  other  sites. 
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We  did  have  a  limited  number  of  software  fail¬ 
ures  with  the  most  significant  being  our  inability 
to  fly  over  the  terrain  database  where  the  groimd 
battle  was  raging  when  it  was  populated  with 
hundreds  of  tanks.  This  was  caused  by  a  software 
bug  in  our  C  code  for  processing  ground  targets 
using  radar. 

One  of  our  goals  was  to  provide  viable  oppo¬ 
nents  for  simulated  and  human  pilots;  however  it 
was  difficult  to  evaluate  the  “sldll”  of  our  TacAir- 
Soar  because  of  some  problems  with  the  under¬ 
lying  simulation  models.  For  example,  dming 
the  first  day,  we  were  frustrated  with  the  per¬ 
formance  of  TacAir-Soar  in  engagements.  Th^ 
were  easily  shot  down  by  ModSAF  F/A-18’s.  We 
later  learned  that  in  order  to  populate  the  simula¬ 
tion  with  different  lypes  of  planes,  the  F/A-18’s 
were  created  by  copying  F-14’s.  The  F/A-18’s 
were  therefore  carrying  Phoenix  missiles  which 
are  much  longer  range  than  any  misnle  carried 
by  an  F/A-18.  ThcAir-Soar,  basing  its  tactical 
behavior  on  the  known  properties  of  F/A-18’s, 
was  caught  by  surprise  (as  it  should  have  been). 

In  engagements  with  humans,  our  planes  would 
often  get  into,  good  tactical  positions,  only  to  see 
our  missiles  miss  when  they  were  shot.  (TacAir- 
Soar  did  have  some  kills  against  biimans  in  simu¬ 
lators,  but  in  general,  TacAir-Soar  got  ‘ftoasted’’.) 
We  believe  that  the  missile  missed  because  of 
flaws  in  the  ModSAF  missile  models.  Thus,  al¬ 
though  TacAir-Soar  got  shot  down,  it  was  in  gen¬ 
eral  using  appropriate  tactical  maneuvers.  In¬ 
dependent  of  the  spedfic  outcome,  this  exercbe 
proved  the  value  of  taking  systems  out  of  the  lab¬ 
oratory  and  testing  them  in  more  realistic  situa¬ 
tions. 

Posdbly  the  best  example  of  our  capabilities 
was  in  the  execution  of  an  unscheduled  event  for 
the  second  day.  In  this  mission,  a  section  of  F/A- 
18’s  were  to  perform  a  ground  attack  against  a  set 
of  isl^ds  in  the  Emulated  battle  aresu  Our  planes 
were  used  in  place  of  a  virtual  (manned)  ground 
attack  because  of  the  failure  of  that  simulator. 
Enroute  to  the  target,  the  planes  were  unex¬ 
pectedly  intercepted  by  ModSAF  MiG-29’s.  The 
F/A-18’s  engaged  the  KCG-29’s  to  defend  them¬ 
selves  and  got  off  one  or  two  shots  (but  no  kills). 
The  MiG-29’s  disappeared  from  the  network,  and 
our  planes  automatically  returned  to  thdr  air- 
to-ground  attack  misdon.  Rirther  enroute,  they 
were  unexpectedly  fired  on  from  a  suiface-to-air 
site,  killing  the  wingman  (not  only  did  the  planes 
not  expect  it,  we  didnH  resJize  there  would  be  any 
surface-to-air  systems  in  STOW-E  —  clearly  an 
unscripted  interaction).  The  lead  continued  on, 
successfully  dropping  bombs  on  the  designated 
target  and  then  jessing  back  to  base. 

Although  we  considered  our  participation  in 
this  exercise  a  success,  it  did  demonstrated  some 


weaknesses  that  we  must  address  in  the  future. 

•  Number  of  vehicles:  We  discovered  that  for 
an  exerdse  with  a  large  number  of  vehicles, 
we  were  not  able  to  run  the  number  of  vehi- 
des/workstation  that  we  had  expected.  Part 
of  this  is  the  overhead  in  the  network  process¬ 
ing  code  of  ModSAF,  but  it  also  was  a  problmi 
for  our  AIC/E2C  agent  wludi  could  see  a  large 
number  of  agents  at  once  because  of  its  radar. 
This  has  led  us  to  use  more  deliberate  focus¬ 
ing  of  attention  in  Air-IFOR  agents  so  that 
they  do  not  attempt  to  process  the  complete 
situation  at  once,  but  instead  concentrate  on 
subsets  of  the  situation,  preferably  those  that 
are  relevant  to  the  current  tactical  situation. 

•  Mission  set  up:  Before  STOW-E,  we  had  not 
developed  any  tools  to  help  specify  and  man¬ 
age  the  misdons  of  Air-IFOR  agents.  During 
STOW-E,  it  was  time-consuming  and  error- 
prone  for  us  to  create  or  modify  the  missions. 
As  a  result,  we  are  currently  developing  graph¬ 
ical  interface  tools  that  will  make  it  possible 
to  enter  and  modify  missions  directly,  without 
editing  intermediate  data  structures.  Our  goal 
is  that  our  interface  should  give  the  user  the 
same  look  and  feel  as  the  documents  and  tools 
used  by  pilots  in  their  normal  briefings.  The 
integration  of  Tcl  and  Soar  is  making  this  much 
easier  because  of  its  ability  to  manage  windows 
and  build  formated  graphical  and  textual  in¬ 
terfaces.  In  the  future  we  must  also  have  the 
ability  to  accept  missions  from  other  software 
systems  using  CCSIL;  however  the  details  of 
the  protocols  have  yet  to  be  defined. 

•  Runtime  control:  Once  Air-IFOR  agents  re- 
cdved  their  missions,  they  would  fly  the  mis¬ 
sions  without  any  human  management.  Thus, 
we  became  observers  and  ran  our  exercises 
“hands-off”.  In  contrast,  the  ModSAF  planes 
required  constant  attention,  with  a  human  con¬ 
trolling  thdr  behavior  on  and  off  during  the 
exercise.  Althou^  we  wish  to  continue  our 
approach,  we  also  came  to  recognize  that  we 
needed  the  ability  to  dynamically  change  some 
aspects  of  the  missions  of  Air-IFOR  agents 
during  the  exerdse,  such  as  changing  the  way- 
point  at  which  a  section  of  planes  is  stationed. 
These  are  relatively  minor  changes  to  TacAir- 
Soar. 

This  exercise  has  the  additional  significance  of 
demonstrating  that  “hard  core”  AI  technology 
can  be  successfully  used  in  an  operational  exer¬ 
cise  (although  in  STOW-E  this  was  in  a  limited 
role).  We  believe  that  this  is  one  of  the  first  (if 
not  the  first)  time  that  an  AI  system  has  been 
used  in  this  way. 

8  Summary  and  Conclusions 
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In  the  beginning  of  the  Soar/IFOR  project, 
there  were  many  questions  as  to  whether  it  was 
practical  to  devdop  intelligent  forces  for  synthetic 
environments.  Although  there  is  still  much  more 
work  to  do,  three  years  of  research  and  devel¬ 
opment  have  brought  us  to  the  point  where  we 
can  state  with  some  degree  of  certainty  that  in¬ 
telligent  forces  are  practical  and  will  play  a  sig¬ 
nificant  role  in  STOW-97.  It  is  difficult  to  iso¬ 
late  specific  parts  of  our  methodology  or  imder- 
lying  technology  as  responsible  for  this  success, 
although  clearly  we  believe  that  the  underlying 
Soar  architecture  is  responsible  to  a  significant 
degree.  Its  ability  to  combine  fine-grain  reactive 
reasoning  of  rules,  with  more  deliberate  and  hier¬ 
archical  decision  making  using  operators  within 
problem  spaces,  appears  to  be  well  matched  to 
the  demands  of  the  interactive  simulation  and  the 
cognitive  processes  of  the  humans  we  are  attempt¬ 
ing  to  model. 

One  surprise  has  been  our  ability  to  build  com¬ 
plex  and  relatively  general  systems  while  not  us¬ 
ing  many  of  the  more  advanced  tedmiques  such  as 
means-ends  analysis,  planning,  learning,  complex 
agent  modeling,  or  natural  language.  However, 
we  still  believe  that  these  are  critical  capabilities 
for  building  robust,  general  agents,  and  we  are 
continuing  to  pursue  research  in  these  areas. 

In  the  immediate  future,  we  will  continue  to 
expand  the  breadth  of  missions  and  capabilities 
of  Air-IFOR  agents.  For  fixed  wing,  a  primary 
goal  is  to  devdop  the  appropriate  agents  to  fiy 
integrated  interdiction  and  strategic  attadc  mis¬ 
sions.  The  coordination  of  many  different  types 
of  aircraft,  with  different  missions  promises  to  be 
challenging.  In  rotary  wing,  our  goal  is  to  fidd 
a  complete  company  of  attack  helicopters.  Our 
plan  is  for  these  devdopments  to  lead  up  to  a 
successful  partidpation  of  Soar/IFOR  agents  in 
STOW-97. 
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1  Introduction 

The  Soar/IFOR  project  is  developing  human-like, 
intelligent  agents  that  can  interact  with  humans,  and 
with  each  other,  in  battlefield  simulations  [10].  Our 
agents  play  a  variety  of  roles  such  as  fighter  pilots,  he¬ 
licopter  pUots,  and  airspace  controllers.  The  fighter 
pilot  agents  in  particular  have  been  successfully  de¬ 
ployed  in  large-scale  simulation  exercises,  such  as  the 
Synthetic  Theater  of  War  (STOW)  exercise  in  Novem¬ 
ber,  1994,  which  modeled  a  four  day  battle  scenario 
involving  approximately  2000  military  vehicles.  Au¬ 
tonomous  agents  such  as  Soar/IFOR  agents  are  ex¬ 
pected  to  continue  to  play  a  major  role  in  battlefield 
simulations,  which  in  turn  are  expected  to  provide  an 
essential  tool  for  military  planning  and  treuning  in  the 
future. 

Soar/It'OR  agents  are  implemented  in  Soar,  a  prob¬ 
lem  solving  architecture  that  integrates  a  number  of 
human  cognitive  functions,  including  problem  solving, 
perception,  and  learning  [4].  Learning  occurs  through 
the  application  of  a  gener^  mechanism  called  chunking 
that  summarizes  the  results  of  processing  on  subgoals, 
in  the  form  of  rules  that  can  apply  to  similar  sub¬ 
goals  in  the  future.  This  chunking  process  is  a  form  of 
explanation-based  learning  EBL  [7,  6].  Chunking  can 
lead  to  speedup  in  learner  performance,  and  is  instru¬ 
mental  to  the  learning  of  new  concepts.  Some  Soar 
systems  have  managed  to  learn  thousands,  and  even 
hundreds  of  thousands,  of  chunks[2]. 

From  the  previous  experience  with  learning  in  Soar, 
it  was  taken  as  a  given  that  the  Soar/IFOR  agents 


could  be  made  capable  of  applying  chunking  in  service 
of  their  performance  requirements.  The  first  research 
question  that  we  focus  on  in  this  paper  is  then  the 
following:  What  kinds  of  knowledge  can  Soar/IFOR 
agents  learn  in  the  combat  simulation  environment? 
In  our  investigations  so  far,  we  have  found  a  number 
of  learning  opportunities  in  our  systems,  which  yield 
several  types  of  learned  rules.  For  example,  some  rules 
speed  up  the  agents’  decision  making,  while  other  rules 
reorganize  the  agent’s  tactical  knowledge  for  the  pur¬ 
pose  of  on-line  explanation  generation. 

Yet,  it  is  also  important  to  ask  a  second  question: 
Can  machine  learning  make  a  significant  difference  in 
Soar/IFOR  agent  performance?  The  main  issue  here 
is  that  battlefield  simulations  are  a  real-world  applica¬ 
tion  of  AI  technology.  The  threshold  which  machine 
learning  must  surpass  in  order  to  be  useful  in  this  en¬ 
vironment  is  therefore  quite  high*  It  is  not  sufiicient  to 
show  that  machme  learning  is  applicable  “in  principle” 
via  small-scale  demonstrations;  we  must  also  demon¬ 
strate  that  learning  provides  significant  benefits  that 
outweigh  any  hidden  costs. 

Thus,  the  overall  objective  of  this  work  is  to  de¬ 
termine  how  machine  learning  can  provide  practical 
benefits  to  real-world  applications  of  artificial  intel¬ 
ligence.  Our  results  so  far  have  identified  instances 
where  machine  learning  succeeds  in  meeting  these  var¬ 
ious  requirements,  and  therefore  can  be  an  important 
resource  in  agent  development.  We  have  conducted 
extensive  learning  experiments  in  the  laboratory,  and 
have  conducted  demonstrations  employing  agents  that 
learn;  to  date,  however,  learning  has  not  yet  been  em¬ 
ployed  in  large-scale  exercises.  The  role  of  machine 
learxung  in  Soar/IFOR  is  expected  to  broaden  as  prac¬ 
tical  impediments  to  learning  are  resolved,  and  the 
capabilities  that  agents  are  expected  to  exhibit  are 
broadened. 


2  The  Problem  Domain 

Soar/IFOR  agents  are  designed  to  work  within  dis¬ 
tributed  interactive  simulations  (DIS)  of  military  exer¬ 
cises.  But  unlike  conventional  “semi-automated”  enti¬ 
ties  in  distributed  simulations,  Soar/IFOR  agents  are 
fully  capable  of  autonomous  decision  making  without 
outside  human  intervention.  They  are  intended  to  be 
realistic  models  of  military  agent  behavior,  so  much 
so  that  to  an  outside  observer  their  behavior  is  indis¬ 
tinguishable  from  that  of  people.  They  must  perform 
most  if  not  all  of  the  functions  that  human  personnel 
would  be  called  upon  to  perform,  e.g.,  to  issue  and/or 
understand  commands,  to  coordinate  their  activities 
with  friendly  forces,  and  to  interpret  and  respond  to 
the  actions  of  enemy  units.  Needless  to  say,  achieving 
these  goals  successfully  is  a  significant  achievement  for 
artificial  intelligence. 

Soar/IFOR  agents  interact  with  distributed  simu¬ 
lations  via  the  ModSAF  simulation  package  [1].  Each 
agent  is  assigned  to  a  ModSAF  simulation  of  a  vehicle, 
e.g.,  an  aircraft.  Soar/IFOR  receives  inputs  from  the 
vehicle,  via  an  abstract  interface  [8],  information  sim¬ 
ilar  to  what  a  human  controlling  the  same  vehicle  in 
the  real  world  would  receive,  such  as  position  of  the  ve¬ 
hicle,  presence  of  enemy  vehicles  in  the  area,  etc.  The 
Soar/IFOR  agent  interprets  the  situation  based  upon 
the  information  received,  decides  on  actions  to  take, 
and  communicates  these  to  ModSAF  as  commands  for 
the  vehicle  to  execute.  Some  of  the  details  of  psy¬ 
chomotor  control  and  resource  contention  are  omitted 
from  the  model,  e.g.,  a  Soar/IFOR  pilot  controls  its 
aircraft  by  specifying  desired  altitudes  and  headings 
instead  of  by  simulating  stick  movements.  However, 
these  abstractions  do  not  simplify  the  agents’  decision 
making  task. 

Soar/IFOR  has  been  tested  in  simulated  exercises 
incorporating  manned  simulation  devices  such  as  flight 
simulators,  semi-automated  forces,  as  well  as  auto¬ 
mated  forces.  Soar/IFOR  agents  are  assigned  missions 
prior  to  the  engagement,  and  are  otherwise  left  to  carry 
out  their  missions  themselves.  Agents  are  evaluated 
according  to  how  appropriately  they  perform  in  eaudi 
individual  engagement. 

Although  such  exercises  are  useful  for  demonstrat- 
big  agent  capabilities,  they  do  not  in  themselves  en¬ 
sure  that  Soar/IFOR  agents  meet  the  needs  of  po¬ 
tential  users  of  distributed  simulations.  For  example, 
in  order  for  users  to  be  certain  that  agent  decision 
making  is  realistic,  they  need  to  understand  the  ra¬ 
tionales  for  the  agent’s  decisions.  This  has  led  to  the 
development  of  an  automated  explanation  capability. 


called  Debrief,  that  enables  users  to  engage  agents  in 
a  question-answer  dialog,  in  a  manner  analogous  to  an 
after-action  review  [3]. 

3  Learning  in  Soar  Agents 

The  air-combat  simulation  environment — ^by  virtue 
of  its  complex,  real-world  characteristics — ^presents 
Soar/IFOR  agents  with  a  number  of  challenging  func¬ 
tional  and  performance  requirements.  There  are  also 
many  ways  in  which  machine  learning  can  help  the 
agents  meet  these  requirements.  Chunking  in  IFOR 
has  been  found  so  far  to  enable  the  following  func¬ 
tional  capabilities  and  performance  improvements. 

•  Decision  making  speeds  up  over  time. 

•  A  memory  of  past  episodes  is  maintained. 

•  Problem  solving  knowledge  is  reorganized  in  order 
to  support  explanation  and  efficient  execution. 

•  Interpretation  of  situations  and  events  improves 
in  quality  with  experience. 

A  Soar/IFOR  agent  engages  in  some  of  this  learn¬ 
ing  on-line,  i.e.,  while  it  is  engaged  in  simulated  com¬ 
bat.  Prime  candidates  for  such  on-line  learning  include 
chunking  for  speedup,  episodic  memory  and  knowl¬ 
edge  compilation.  However,  not  all  learning  caii  or 
should  occur  on  line.  In  particular,  some  of  the  learn¬ 
ing  requires  that  a  Soar/IFOR  agent  consider  the  con¬ 
sequences  of  its  decisions,  explore  alternative  decisions, 
and  learn  from  the  results.  Because  of  the  real-time 
pressures  of  air-to-air  combat,  a  Soar/IFOR  agent  may 
not  have  the  free  time  to  engage  in  such  d^beration. 
Time  pressures  are  certainly  not  continuous:  there 
be  momentary  lulls  in  activity  that  could  be  used  for 
deliberation  and  learning,  but  as  yet  are  not.  Instead, 
Soar/IFOR  agents  rely  upon  off-line  analysis  for  such 
learning.  It  waits  for  the  combat  situation  to  termi¬ 
nate,  so  it  can  analyze  past  situations  without  inter¬ 
ruption.  This  enables  the  agents  to  explain  their  rea¬ 
soning  during  after-action  review,  for  example. 

Learned  chunks  are  applied  to  future  decisions  in 
the  following  ways.  A  chunk  learned  during  an  engage¬ 
ment  may  apply  later  on  within  the  same  engagement. 
It  may  apply  during  after-action  review  of  the  engage¬ 
ment.  Finally,  chunks  created  during  a  mission  or  dur¬ 
ing  after-action  review  are  saved  so  that  they  can  be 
employed  by  agents  in  future  missions  and  review  ses¬ 
sions,  enabling  the  agents  to  learn  from  accumulated 
experience. 


3.1  Speeding  Up  Decisions 

In  much  machine  learning  research,  such  as  [5], 
speedup  is  measured  by  comparing  problem  solving 
time  after  learning  to  problem  solving  time  without 
learning.  Such  a  measure  is  inappropriate  for  learn¬ 
ing  in  Soar/IFOR,  because  chunking  does  not  yield  an 
overall  speedup,  i.e.,  it  does  not  reduce  the  overall  du¬ 
ration  of  the  engagement.  In  other  domains  such  lack 
of  speedup  might  be  attributable  to  the  high  cost  of 
matching  and  retrieving  the  learned  chunks[ll].  How¬ 
ever,  for  Soar/IFOR  agents,  the  cost  of  matching  and 
retrieving  learned  rules  is  not  much  of  an  overhead. 
Rather  a  combination  of  the  following  two  effects  are 
at  work.  First,  combat  simulation  involves  performing 
(simulated)  physical  actions  and  responding  to  eicter- 
nal  events.  Learning  cannot  affect  the  duration  of  such 
actions  and  events;  at  best  it  can  reduce  the  time  re¬ 
quired  to  decide  on  an  action  or  interpret  an  event. 
Second,  cognitive  activity  is  concentrated  in  isolated 
episodes,  separated  by  periods  of  relative  inactivity. 
Speedups  in  deliberation  contribute  very  little  to  re¬ 
ductions  in  the  overall  duration  of  a  scenario.  For  in¬ 
stance,  suppose  a  Soar/IFOR  agent  decides  to  launch 
a  missile  at  an  opponent.  To  that  end,  it  must  decide 
which  type  of  missile  to  employ,  and  how  best  to  ap¬ 
proach  the  opponent’s  aircraft.  These  decisions  take 
up  at  most  a  few  seconds.  The  agent  then  has  to  wait, 
sometimes  for  up  to  a  minute  or  more  while  the  op¬ 
ponent  gets  into  its  missile  firing  range.  Decision  time 
thus  has  little  or  no  effect  on  overall  time  to  intercept 
the  opponent. 

Although  learning  has  little  effect  on  the  overall  du¬ 
ration  of  engagements,  it  can  make  a  substantial  dif¬ 
ference  in  time-critical  situations.  In  such  situations, 
small  delays  in  an  agent’s  action  can  jeopardize  its  sur¬ 
vival,  or  prevent  the  agent  from  exploiting  momentary 
advantages  over  an  opponent.  For  instance,  when  a 
Soar/IFOR  agent  fires  a  missUe  at  its  opponent,  the 
opponent  may  engage  in  a  missile  evasion  tactic  that 
can  cause  it  to  break  radar  contact  (disappear  from  the 
Soar/IFOR  agent’s  radar).  The  opponent  may  then 
turn  quickly  to  fire  a  missile  at  the  Soar/IFOR  agent. 
This  is  an  extremely  time-critical  situation.  When 
the  opponent  turns  back  after  its  missile  evasion  ma¬ 
neuver,  the  Soar/IFOR  agent  obtains  a  new  contact 
(blip)  on  its  radar.  This  blip  could  be  the  opponent, 
or  perhaps  a  friendly  aircraft  who  has  just  arrived  in 
radar  range.  The  Soar/IFOR  agent  must  quickly  de¬ 
termine  the  contact’s  identity,  and  then  launch  a  sec¬ 
ond  missile  before  the  opponent  fires  her  missile.  If 
the  Soar/IFOR  agent  is  delayed  in  re-establishing  the 


opponent’s  identity,  it  may  get  shot  down.  Chunking 
can  enable  Soar/IFOR  agents  to  arrive  at  important 
decisions  more  rapidly  the  next  time  a  similar  situa¬ 
tion  is  encountered.  The  end  result  is  that  the  agents 
can  survive  longer,  and  fight  better. 

A  possible  way  of  measuring  speedup  might  be  to 
measure  an  agent’s  reaction  time,  i.e.,  the  from  an  ex¬ 
ternal  event  until  the  agent’s  response  to  that  event. 
This  presupposes,  however,  that  the  stimuli  are  con¬ 
trolled  so  that  there  is  a  clear  relationship  between 
stimulus  and  response.  However,  battlefield  engage¬ 
ments  are  not  like  controlled  laboratory  experiments: 
instead,  agents  are  constantly  exposed  to  a  variety  of 
stimuli,  and  perform  a  variety  of  tasks,  often  at  the 
same  time.  Reducing  the  amount  of  time  required  to 
interpret  one  stimulus  often  has  the  indirect  effect  of 
enabling  the  agent  to  attend  to  other  stimuli  that  were 
previously  overlooked,  such  as  a  second  opponent  that 
has  just  arrived  in  radar  range.  This  clearly  can  have 
an  impact  on  overall  agent  performance,  but  in  a  way 
that  is  difficult  to  quantify. 

3.2  Maintaining  an  Episodic  Memory 

It  is  useful  for  Soar/IFOR  agents  to  have  an  episodic 
memory,  so  that  they  can  recall  episodes  from  previous 
engagements  during  after-action  review  or  subsequent 
missions.  Episodic  memory  can  be  regarded  as  an  as¬ 
pect  of  learning,  insofar  as  the  problem  solver’s  rea¬ 
soning  after  memory  formation  is  different  from  that 
before  memory  formation.  It  is  instrumental  to  other 
types  of  learning:  for  example,  if  an  agent  can  recog¬ 
nize  that  the  current  situation  is  similar  to  previous 
situations,  it  can  then  apply  its  previous  experience  to 
the  new  situation. 

We  have  found  that  chunking  can  be  readily  em¬ 
ployed  to  address  part  of  the  episodic  memory  prob¬ 
lem,  namely  to  learn  to  recall  the  circumstances  in 
which  a  given  event  occurred.  That  is,  when  presented 
with  a  description  of  an  event,  chunks  fire  which  recre¬ 
ate  a  description  of  the  world  state  that  prevaUed  at 
that  time.  Other  aspects  of  episodic  memory,  such  as 
recalling  what  events  occurred  as  part  of  a  given  mis¬ 
sion,  are  not  as  yet  handled  via  chunking;  the  agent 
instead  simply  records  the  events  that  occur  in  a  con¬ 
ventional  list  data  structure. 

The  episodic  memory  mechanism  relies  on  two  sets 
of  chunks.  The  first  set  consists  of  recognition  chunks, 
which  are  common  in  a  range  of  Soar  systems.  Recog¬ 
nition  chunks  fire  in  response  to  some  description  that 
serves  as  a  memory  probe,  indicating  that  an  instance 
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matching  the  probe  has  been  seen  before.  In  the 
Soar/IFOR  case,  the  memory  probe  consists  of  a  de¬ 
scription  of  an  event,  together  with  a  possible  state 
change.  If  the  state  ch^uage  occurred  at  the  time 
the  event  was  observed,  the  recognition  chunk  will 
fire.  These  recognition  chunks  are  created  in  a  special 
episodic-memory  subgoal,  which  is  processed  whenever 
the  agent  notices  a  significant  state  change.  The  sec¬ 
ond  set  of  chunks  are  recall  chunks,  which  recall  the 
complete  state  in  which  an  event  occurred,  when  pre¬ 
sented  with  an  event  description  as  a  memory  probe. 
The  first  time  Soar/IFOR  attempts  to  recall  the  state 
associated  with  an  event,  it  first  tries  to  find  an  earlier 
event  for  which  it  can  recall  a  state.  It  then  tries  to 
recall  which  state  changes  occurred  between  the  earlier 
state  and  the  state  of  interest.  The  previously  created 
recognition  chunks  identify  the  relevant  state  changes. 
Once  the  recall  process  is  complete,  a  recall  chunk  is 
created,  so  that  the  next  time  the  event  is  used  as  a 
memory  probe  the  state  is  immediately  recalled. 

Episodic  memory  illustrates  how  chunking  can  serve 
as  an  tmderlying  mechanism  for  a  variety  of  types  of 
learning  besides  simple  speedup.  Such  learning  may 
require  problem  spaces  that  are  specially  designed  to 
generate  particular  types  of  chunks  such  as  recognition 
chunks  or  recall  chunks. 

3.3  Reorganizing  Knowledge 

Chunking  also  enables  Soar/IFOR  agents  to  reor¬ 
ganize  their  knowledge.  In  knowledge  based  systems 
generally,  the  form  in  which  knowledge  is  encoded  de¬ 
pends  upon  how  the  knowledge  engineer  intends  the 
knowledge  to  be  used.  Learning  enables  knowledge 
encoded  for  one  purpose,  i.e.,  controlling  the  agent’s 
behavior,  to  be  employed  for  other  purposes,  e.g.,  ex¬ 
plaining  the  agent’s  decisions. 

Soar/IFOR’s  interactive  explanation  capability, 
caUed  Debrief,  makes  extensive  use  of  chunking  for 
knowledge  reorganization  [3].  The  agents  can  explain 
the  rationales  for  decisions  made  during  an  engage¬ 
ment,  by  relating  chosen  decisions  to  the  critical  fac¬ 
tors  in  the  situation  that  led  to  those  decisions.  The 
knowledge  needed  to  generate  such  explanations,  i.e., 
associations  between  decisions  and  sets  of  situational 
factors,  is  different  from  the  knowledge  used  to  gener¬ 
ate  the  decisions  in  the  first  place.  For  one  thing,  the 
process  of  generating  the  decision  may  involve  inter¬ 
nal  reasoning  mechanisms  that  are  of  little  interest  to 
someone  who  is  not  an  agent  developer.  Recognition 
chunks  are  built  which  identify  the  key  factors  leading 
to  a  decision  in  a  given  situation.  This  is  accomplished 


by  reconsidering  the  decisions  after  the  engagement  is 
over,  and  proposing  hypothetical  changes  to  the  situa¬ 
tion  in  which  the  decision  was  made.  The  set  of  state 
features  that  prove  significant,  because  altering  them 
alters  the  outcome  of  the  decision,  is  saved  in  a  chunk. 
If  the  agent  is  asked  to  explain  a  similar  decision  in  a 
similar  situation,  the  recognition  chunk  will  fire  iden¬ 
tifying  those  features  of  the  situation  that  should  be 
included  in  the  explanation. 

Knowledge  reorganization  also  allows  knowledge  or¬ 
ganized  for  ease  of  knowledge  engineering  to  be  ren¬ 
dered  in  a  form  suitable  for  efficient  execution.  The 
Soar/IFOR  project  is  developing  a  variety  of  types  of 
agents,  among  which  only  some  knowledge  is  shared. 
Rules  therefore  tend  to  be  factored  so  as  to  separate 
the  shared  knowledge  from  the  unshared  knowledge. 
Chunking  is  used  in  some  cases  to  combine  this  knowl¬ 
edge  into  larger  agent-specific  rules,  thus  reducing  the 
number  of  rules  that  must  execute.  This  happens  be¬ 
cause  chunking  summarizes  the  results  of  all  rules  that 
are  executed  in  a  subgoal,  in  the  form  of  a  single  rule 
that  represents  their  combined  effect.  Agent  develop¬ 
ers  are  thus  free  to  encode  the  knowledge  in  a  factored 
form,  with  the  expectation  that  the  factored  rules  will 
be  combined  when  they  are  executed  by  the  agent. 

3.4  Improving  Situation  Interpretations 

Accurate  interpretations  of  the  rapidly  evolving 
battlefield  situation  is  key  to  a  Soar/IFOR  agent’s  suc¬ 
cessful  task  performance.  One  important  component 
of  such  an  interpretation  is  accurate  tracking  of  an 
opponent’s  ongoing  actions,  to  infer  her  higher  level 
goals,  plans  or  behaviors.  For  instance,  a  Soar/IFOR 
agent  cannot  actually  observe  a  missile,  but  needs  to 
infer  a  missile  firing  based  on  the  opponent’s  maneu¬ 
vers,  as  shown  in  Figure  1.  Here,  the  Soar/IFOR  agent 
is  piloting  the  dark-shaded  mrcraft  and  its  opponent 
the  light-shaded  one.  In  Figure  1-a  the  two  aircraft 
are  on  collision  course — ^if  they  fly  straight  they  will 
collide  at  the  point  shown  by  x.  After  reaching  her 
missile  firing  range,  the  opponent  turns  her  aircraft  to 
point  at  the  Soar/IFOR  agent’s  aircraft  (see  Figure 
1-b).  In  this  situation,  the  opponent  fires  a  missile. 
She  then  turns  45-degrees — an  Fpole  turn — to  provide 
radeir  guidance  to  the  missile,  while  slowing  the  closure 
between  the  two  aircraft.  The  Soar/IFOR  agent  can¬ 
not  observe  this  missile,  but  based  on  the  opponent’s 
turn  to  point  at  its  aircraft  and  the  subsequent  Fpole 
turn,  it  needs  to  infer  that  the  opponent  has  fired  a 
missile. 

Unfortunately  for  the  Soar/IFOR  agents,  the  hu- 
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Figure  1:  Tracking  an  opponent’s  normal  missile  firing 
maneuvers.  An  arc  on  an  aircraft’s  nose  indicates  its 
turn  direction.  The  missile  is  indicated  by 


man  pilots  in  the  STOW-E  exercise  (see  Section  1) 
were  briefed  as  to  what  cues  Soar/IFOR  looks  for 
when  interpreting  opponent  actions,  and  how  they 
might  be  able  to  fool  Soar/IFOR  by  avoiding  these 
cues.  They  deliberately  modified  their  missile  fir¬ 
ing  behavior  to  fire  missiles  while  maintaining  a  2&- 
degree  angle-off  (i.e.,  pointing  25-degrees  away  from 
Soar/IFOR  agents’  aircraft).  The  Soar/IFOR  agents 
failed  to  track  the  missile  firing  and  got  shot  down.  Of 
course,  human  pilots  are  bound  to  come  up  with  novel 
variations  on  known  maneuvers,  and  the  Soar/IFOR 
agents  cannot  be  expected  to  anticipate  them.  Yet, 
at  the  same  time,  agents  cannot  remain  in  a  state 
of  permanent  vulnerability — ^for  instance,  getting  shot 
down  each  time  the  variation  of  25-degrees  gets  used — 
otherwise  they  would  be  unable  to  provide  a  challeng¬ 
ing  and  appropriate  training  environment  for  human 
pilots. 

The  Soar/IFOR  agents  must  adapt  their  opponent 
tracking  to  counter  such  adaptive  behavior  on  the  part 
of  humans.  To  this  end,  we  are  developing  the  ca¬ 
pability  to  analyze  the  past  combat  episodes  off-line, 
and  learn  from  obvious  errors.  In  the  above  case,  the 
Soar/IFOR  agent  records  in  its  episodic  memory  that 
it  got  shot  down.  Its  episodic  memory  of  the  com¬ 
bat  also  reveals  that  it  never  detected  the  opponent’s 
missile  firing  behavior.  Simultcuieously,  however,  the 
episodic  memory  will  note  that  the  agent  did  face  a 
mysterious  maneuver  that  it  was  unable  to  track  (cor¬ 
responding  to  the  missile  firing  with  a  25-degree  angle- 
off).  Based  on  this  episodic  memory,  the  agent  can 
learn  that  the  human  pilot  can  fire  a  missile  from  a 
25-degree  angle-off. 


4  Practical  Aspects  of  Using  Chunking 

Given  the  Soar/IFOR  agents’  real-world  environ¬ 
ment,  the  costs  and  benefits  of  chunking  have  to  be 
evaluated  from  a  practical  perspective.  The  key  ques¬ 
tion  here  is:  Do  the  benefits  of  chunking  outweigh  its 


costs  as  it  stands  today?  In  this  regard,  the  following 
factors  need  to  be  taken  into  account: 

1.  The  Soar/IFOR  agents’  current  knowledge  is  al¬ 
ready  encoded  in  a  highly  optimized  form,  so  that 
they  can  rapidly  respond  to  opponents’  maneu¬ 
vers.  It  is  difficult  for  chunking  to  improve  upon 
such  decisions,  other  than  to  reorganize  the  en¬ 
coded  knowledge  somewhat,  as  described  above. 

2.  The  agents’  current  knowledge  is  the  result  of  ex¬ 
tensive  knowledge  acquisition  sessions.  Some  of 
the  tactical  knowledge  gained  from  these  sessions 
is  highly  sophisticated  and  a  result  of  careful  anal¬ 
ysis  of  the  capabilities  of  the  opposing  forces.  It 
is  difficult  for  chunking  techniques  to  reconstruct, 
much  less  improve  on,  this  expertise. 

3.  Chunks  learned  are  sometimes  highly  specific — 
their  conditions  refer  to  the  agent’s  current  situ¬ 
ation  in  terms  of  the  value  of  its  altitude,  speed, 
range  from  an  opponent,  etc.  Such  chunks  do  not 
transfer  (apply)  to  other  similar  situations,  thus 
reducing  the  effectiveness  of  chunking. 

4.  The  learning  process  itself  can  incur  development 
overhead.  Modifications  to  agent  code  can  in¬ 
validate  previously  created  chunks.  Thus  as  the 
agents  are  modified,  training  sessions  must  be  run 
repeatedly  in  order  to  produce  an  up-to-date  set 
of  chunks. 

The  above  practical  issues  in  applying  chunking, 
combined  with  our  earlier  observations  regarding  the 
lack  of  overall  speedups,  implies  that  on-line  chunking 
has  to  be  very  carefully  applied,  if  at  all,  in  service  of 
speedups.  We  find  it  expedient  to  turn  chunking  on 
when  the  agents  are  making  certain  t3rpes  of  decisions, 
and  turn  it  off  elsewhere. 


5  Long-Term  Prospects 

As  development  of  Soar/IFOR  proceeds,  new  op¬ 
portunities  continue  to  present  themselves  for  making 
more  extensive  use  of  machine  learning,  and  to  em¬ 
ploy  existing  learning  abilities  in  new  ways.  Episodic 
memory  is  a  good  example  of  the  latter:  once  an  agent 
has  the  ability  to  remember  previous  episodes,  a  va¬ 
riety  of  possibilities  for  learning  from  those  episodes 
present  themselves.  As  the  added  capabilities  afforded 
by  machine  learning  accumulate,  and  the  costs  asso¬ 
ciated  with  learning  are  mitigated,  the  benefits  stem¬ 
ming  from  learning  are  expected  to  dominate  the  costs 
to  a  greater  and  greater  extent. 


There  is  reason  to  believe,  in  fact,  that  eventually 
further  improvement  in  performance  of  Soar/IFOR 
agents  will  only  be  achievable  by  means  of  machine 
learning.  As  long  as  the  decision  making  of  Soar/IFOR 
agents  is  governed  by  lixed  rules,  wily  human  oppo¬ 
nents  will  learn  ways  of  gaining  advantages  over  the 
agents.  This  will  be  especially  true  if  and  when  these 
agents  are  integrated  into  training  devices  that  are 
used  on  a  routine  basis.  If  current  work  on  enabling 
Soar/IFOR  to  learn  from  experience  can  be  applied 
to  a  range  of  situations  and  scenarios,  then  human 
trainees  will  find  simulations  to  be  continuaUy  chal¬ 
lenging,  and  able  to  put  their  tactical  skills  fiilly  to 
the  test. 
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Abstract 

TacAui-Soar  is  a  reactive  system  that  uses 
recognition-driven  problem  solving  to  plan  and 
generate  behavior  in  the  domain  of  tactical  air 
combat  simulation.  Our  current  research  efforts 
focus  on  integrating  more  deliberative  planning 
and  learning  mechanisms  into  the  system.  This 
paper  discusses  characteristics  of  the  domain  that 
influence  potential  planning  solutions,  together 
with  our  approach  for  integrating  revive  and 
deliberative  planning. 


TacAir-Soar  (Jones  ei  al  1993;  Rosenbloom  ei  aL 
1994)  implements  artificial,  intelligent  agents  for  use  in 
tactical  flight  training  simulators.  The  overall  goal  of 
the  project  is  to  create  automatic  agents  that  generate 
behavior  as  similar  as  possible  to  humans  flying  flight 
simulators.  These  agents  will  help  provide  relatively 
cheap  and  effective  training  for  Navy  pilots. 

In  order  to  accomplish  this  task,  we  need  not  only  to 
acquire  and  encode  a  large  amount  of  complex  knowl¬ 
edge,  but  also  to  address  a  number  of  core  research  is¬ 
sues  within  artificial  intelligence.  Not  the  least  of  these 
issues  is  the  ability  for  the  agent  to  plan  its  activities 
appropriately,  and  to  acquire  eflicient  and  effective  new 
behaviors  as  a  consequence  of  planning. 

We  are  investigating  the  hypothesis  that  a  variety  of 
appropriate  behaviors  can  arise  from  a  system  with  a 
small,  organized  set  of  cognitive  mechanisms  as  it  inter¬ 
acts  with  a  complex  environment.  Thus,  the  primary 
thrust  of  our  research  relies  on  integration  in  a  num¬ 
ber  of  different  forms.  Reactive  behavior  generation 
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must  be  integrated  with  goal-directed  reasoning  and 
planning.  These  in  turn  must  be  integrated  with  other 
cogmtive  capabilities,  such  as  situation  interpretation, 
natural  language  understanding  and  generation,  plan 
recogmtion,  planning,  etc.  Rather  than  combining  dis¬ 
tinct  modules  for  execution,  planning,  and  learning, 
we  are  attempting  to  integrate  all  of  these  capabilities 
within  a  single  control  scheme.  Thus,  planning  be¬ 
comes  simply  another  form  of  execution,  which  must 
interact  with  other  knowledge  in  order  to  generate  ap¬ 
propriate  behavior.  Learning  occurs  as  a  ade  effect  of 
execution,  manifesting  itself  in  different  ways  depend¬ 
ing  on  the  particular  tasks  being  executed.  Because  of 
the  incremental,  dynamic,  and  complex  nature  of  be¬ 
havior  generation  in  the  tactical  air  domain,  learning 
must  also  be  incremental,  fast,  and  able  to  capture  the 
complexities  of  goals  and  actions. 

The  current  version  of  TacAir-Soar  combines  re¬ 
active  and  goal-driven  reasoning  to  create  what  we 
call  recognition^driven  problem  solving  (Tambe  et  a/. 
1994).  The  system  contains  a  large  set  of  rules  that 
fire  as  soon  as  their  conditions  are  met,  without  search 
or  conflict  resolution.  Some  of  these  rules  respond  to 
immediate  changes  in  sensory  inputs,  while  others  re¬ 
spond  to  higher-level  interpretations  of  those  changes 
and  goals  that  the  system  posts  for  itself.  As  an  ex¬ 
ample,  Tag  Air-Soar  may  observe  a  series  of  readings 
about  a  contact  on  its  radar,  and  conclude  that  the 
contact  is  an  aggressive  enemy  aircraft.  Thus,  the  as¬ 
tern  posts  a  goal  of  intercepting  the  aircraft,  which  in¬ 
volves  maintaining  a  collision  course.  The  actual  head¬ 
ing  of  TacAir-Soar*s  aircraft  will  change  every  time 
the  collision  course  changes.  This  paradigm  for  be¬ 
havior  generation  is  similar  to  reactive  planning  in  the 
spirit  of  Firby’s  (1987)  RAP  planners.  That  is,  the 
system  does  not  perform  any  search  to  determine  the 
best  course  of  action,  and  it  does  not  plan  in  terms  of 
predicting  future  states  of  the  environment.^  It  also 


^  Tag  Air-Soar  agents  do  some  prediction,  but  it  is  i>art 


computes  its  behavior  dynamically,  rather  than  gener- 
ating  a  declarative  plan  that  is  later  interpreted.  Part 
of  our  current  research  effort  is  to  equip  Tag  Air-Soar 
with  a  deliberative  planning  component  that  separates 
planning  from  normal  execution  by  projecting  future 
possible  states  and  searching  through  them  to  decide 
on  appropriate  courses  of  action. 

Of  the  following  three  sections,  the  first  provides  a 
short  motivation  for  the  usefulness  of  deliberative  plan¬ 
ning  in  the  tactical  air  domain.  The  second  lists  a 
number  of  characteristics  of  the  domain  that  have  a 
significant  impact  on  how  planning  must  occur.  These 
characteristics  have  been  discussed  in  various  earlier 
work  on  planning,  but  our  work  will  address  all  of 
them  together  and  attempt  to  provide  a  planning  so¬ 
lution  that  naturally  integrates  into  recognition-driven 
problem  solving.  The  final  section  sketches  potential 
solutions  for  deliberative  planning.  These  solutions  are 
suggested  by  a  combination  of  the  characteristics  of  the 
domain,  our  desire  for  a  fully  integrated  system,  and 
the  problem-solving  and  learning  paradigms  provided 
by  the  Soar  architecture. 

Advantages  of  Deliberative  Planning 

As  mentioned  previously,  the  overall  goal  for  the 
TacAir-Soar  system  is  to  generate  human-like  be¬ 
havior  within  the  simulated  environment.  One  hall¬ 
mark  of  human  behavior  is  flexibility  in  the  face  of 
new  situations.  The  current  system  has  been  equipped 
with  a  large  knowledge  base  of  tactics,  vehicle  d3mam- 
ics,  weapons  characteristics,  etc.,  and  this  allows  the 
system  to  generate  a  wide  variety  of  behaviors  in  re¬ 
sponse  to  different  situations,  missions,  and  goals.  One 
approach  to  this  type  of  domain  has  been  to  attempt  to 
capture  every  possible  situation  that  an  agent  may  en¬ 
counter  in  a  recognition  rule  (e.g.,  Bimson  ei  al  1994). 
However,  even  if  such  an  approach  is  possible,  it  would 
require  extensive  work  on  the  knowledge  base  every 
time  the  domain  changes  a  bit  (for  example,  if  new 
aircraft  or  missiles  are  developed). 

In  response  to  this  difiiculty,  an  agent  must  detect 
when  it  does  not  have  suitable  knowledge  to  react  to  a 
particular  situation,  and  use  its  planning  capabilities  to 
generate  appropriate  actions  based  on  more  fundamen¬ 
tal  knowledge.  This  requires  the  agent  to  integrate  de¬ 
liberative  planning  with  its  current  recognition-driven 
reasoning  mechanisms.  Naturally,  we  also  expect  the 
agent  to  learn  from  its  plaiming  episodes,  generating 
new  rules  for  future  similar  situations. 

TacAjr-Soar  will  do  much  of  its  planning  “in  the 
air,”  where  there  are  tight  restrictions  on  time,  thus 

of  normal  behavior  generation,  and  not  something  that  is 
learned  about  for  decision  making. 


limiting  the  learning  opportunities.  However,  human 
pilots  often  learn  by  flying  real  or  simulated  scenar¬ 
ios,  and  then  debriefing  the  scenarios  on  the  ground. 
By  going  back  over  each  step  of  the  scenario,  the  pi¬ 
lot  can  identify  successes  and  failures,  consider  alter¬ 
native  courses  of  action,  and  take  more  time  to  eval¬ 
uate  various  possible  outcomes.  Automated  agents 
have  also  been  demonstrated  to  benefit  from  such  self¬ 
explanations  (VanLehn,  Jones,  &  Chi  1992),  In  addi¬ 
tion,  Johnson  (1994a;  1994b)  has  presented  a  debrief¬ 
ing  facUity,  in  which  Tag  Air-Soar  agents  can  explain 
their  actions  after  a  scenario,  and  consider  some  hypo¬ 
thetical  alternatives.  The  deliberative  planning  mech¬ 
anism  should  expand  on  this  approach  and  allow  the 
system  to  learn  from  the  debriefing  experience.  In  ad¬ 
dition,  we  intend  the  same  planning  mechanism  to  be 
used  for  planning  both  in  the  dynamic  environment  of 
an  engagement  and  the  calm,  slow-paced  environment 
of  a  debriefing  session.  Naturally,  when  the  agent  has 
more  time  to  plan,  the  quality  and  quantity  of  effective 
learning  should  increase,  but  this  will  be  due  to  the  dy¬ 
namics  of  the  planning  situation,  not  because  of  any 
differences  in  the  planning  and  learning  mechanisms. 

Planning  Issues  for  Tactical  Flight 

This  section  focuses  on  the  specific  aspects  of  the  tac¬ 
tical  air  domain  that  have  a  significant  impact  on  how 
planning  should  be  carried  out.  There  are  five  par¬ 
ticular  characteristics  that  set  the  domain  apart  from 
traditional  domains  used  in  planning  research. 

Interaction  of  Domain  Goals 

The  current  version  of  Tag Ajr-Soar  knows  about  al¬ 
most  100  different  types  of  goals,  and  many  of  these 
interact  with  each  other.  For  example,  there  are  times 
when  an  agent  wants  simultaneously  to  fly  toward  a 
target,  evade  an  incoming  missile,  and  maintain  radar 
contact  with  another  aircraft.  This  presents  the  tradi¬ 
tional  problem  of  planning  for  goal  conjuncts  (Chap¬ 
man  1987;  Covrig2uu  1992).  However,  we  must  trade 
off  the  intensive  search  that  can  be  involved  in  this 
type  of  planning  with  the  dynamic  and  uncertain  na¬ 
ture  of  the  task  (discussed  below).  Other  researchers 
(e.g.,  Cohen  ti  aL  1989;  Veloso  1989)  have  suggested 
methods  for  planning  about  conjunctive  goals  in  real 
time,  and  we  hope  to  borrow  from  these  approaches  in 
our  own  efforts. 

Two  primary  elements  of  conjunctive  goal  planning 
are  detecting  a  goal  interaction  and  then  finding  a  way 
to  deal  with  the  interaction.  Within  TacAjr-Soar, 
interactions  will  generally  be  detected  when  conflict¬ 
ing  output  commands  are  sent  to  the  simulator  (e.g., 
to  come  to  two  different  headings)  or  when  goal  con- 
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straints  are  incompatible  (e.g.,  turning  away  from  a 
target  while  also  maintaining  a  radar  lock).  In  general, 
there  will  be  two  methods  for  dealing  with  such  goal 
interactions.  Some  goals  can  be  achieved  conjunctively 
(perhaps  not  as  efficiently  as  if  the  goals  were  indepen¬ 
dent),  but  sometimes  it  will  be  necessary  to  suspend 
certain  goals  temporarily  when  goals  of  higher  prior¬ 
ity  (such  as  evading  an  incoming  threat)  conflict  with 
them. 

Dynamic,  Real-Time  Environment 

As  suggested  above,  TacAir-Soar  cannot  generally 
assume  that  it  has  ample  time  to  plan.  An  agent  may 
be  planning  an  intercept  course  to  a  target  when  it 
detects  an  incoming  missile.  In  this  case,  the  agent 
must  interrupt  its  planning  in  order  to  react  in  a  timely 
fashion.  As  a  slightly  different  case,  the  situation  may 
change  so  rapidly  that  the  conditions  that  initiated 
planning  may  become  obsolete  before  planning  is  com¬ 
pleted.  For  example,  the  agent  may  begin  planning 
which  type  of  weapon  it  should  employ  against  a  tar¬ 
get,  only  to  find  it  destroyed  by  some  other  partici¬ 
pant  in  the  engagement.  In  both  of  these  situations, 
the  system  should  cease  its  planning  activity,  even  if  it 
did  not  find  a  result.  Reactive  planning  systems  (e.g., 
Agre  &  Chapman  1987;  Firby  1987;  Kaelbling  1986), 
and  TacAir-Soar’s  recognition-driven  problem  solv¬ 
ing  address  some  of  these  issues  by  dynamically  chang¬ 
ing  goals  and  behaviors  as  the  environment  changes. 
The  next  challenge  is  to  integrate  deliberative  plan¬ 
ning  with  dynamic  reasoning  in  a  smooth  way. 

Large  State  Representation 
A  further  characteristic  of  the  domain  is  that  it  in¬ 
volves  rather  large  representations  of  the  agent’s  cur¬ 
rent  situation.  The  state  representation  includes  infor¬ 
mation  about  various  vehicle  and  weapon  types,  sen¬ 
sor  information  (from  visual,  radar,  and  radio  sources), 
the  agent’s  current  mission  goals,  other  "mental”  an¬ 
notations,  and  interpretations  of  the  state,  actions, 
and  goals  of  other  agents.  For  normal  recognition- 
driven  problem  solving,  the  situated  TacAir-Soar 
agent  simply  reacts  to  various  features  in  this  large 
state  by  generating  actions  or  posting  new  goals  or 
new  interpretations  of  the  situation. 

The  size  of  the  state  can  impact  deliberative  plan¬ 
ning  in  three  ways.  First,  any  time  the  agent  wishes  to 
plan,  it  must  construct  a  copy  of  its  current  state  rep¬ 
resentation.  It  can  then  manipulate  this  copy  without 
changing  its  actual  representation  of  the  world  or  is¬ 
suing  real  behaviors.  Second,  separating  the  two  state 
representations  allows  the  system  to  generate  low-level 
reactions  in  response  to  one  state  while  planning  with 
the  other.  Because  it  takes  some  time  to  create  this 


mental  planning  state,  the  agent  should  copy  only  the 
necessary  information  for  planning  and  no  more.  Fi¬ 
nally,  some  of  the  state  information  will  be  important 
to  the  current  plan,  while  other  information  will  be  less 
important  or  totally  irrelevant.  It  is  not  desirable  for 
the  agent  to  reason  about  portions  of  the  state  that 
have  no  bearing  on  the  current  decision.  Thus,  ded- 
sions  about  how  much  state  to  copy  will  have  an  impact 
on  learning  and  the  generality  of  new  behaviors. 

Planning  in  the  Face  of  Uncertainty 

A  key  feature  of  the  tactical  air  domain  is  that  there  is 
generally  a  large  number  of  participants  in  any  given 
scenario.  Some  research  (e.g.,  Georgeff  1984)  has  fo¬ 
cused  on  this  problem,  and  it  naturally  will  have  a 
strong  effect  on  how  TacAir-Soar  can  interpret  and 
predict  the  consequences  of  its  actions  while  plan¬ 
ning.  Anticipating  the  actions  of  cooperating  agents 
may  not  be  too  difficult,  because  there  exist  social 
engagements  and  standard  operating  procedures  be¬ 
tween  agents  that  cooperate.  Predicting  the  future 
actions  of  competing  agents  is  somewhat  more  diffi¬ 
cult,  and  relies  in  part  on  recognizing  the  plans  and 
goals  of  those  agents  (Tambe  &  Rosenbloom  1994; 
Wilensky  1981). 

Given  the  unpredictable  nature  of  modeling  other 
agents,  it  is  most  appropriate  for  TacAir-Soar  to 
create  completable  plans  (Gervasio  &  DeJong  1994), 
in  order  to  react  appropriately  to  future  actions  by 
other  agents.  Contingency  plans  (Warren  1976)  might 
also  be  useful,  but  these  are  generally  expensive  to 
generate.  In  a  sense,  TacAir-Soar’s  current  knowl¬ 
edge  base  consists  of  a  large  completable  plan,  and 
such  planning  is  consistent  with  our  desire  to  integrate 
the  current  recognition-driven  problem-solving  struc¬ 
ture  with  deliberative  planning.  The  results  of  delib¬ 
erative  planning  should  be  completable,  reactive  plans 
that  the  agent  can  execute  and  adapt  in  response  to 
the  dynamics  of  the  environment. 

Termination  of  Planning 

As  we  have  already  mentioned,  available  time  will  have 
a  large  impact  on  how  long  any  planning  activity  can 
continue.  However,  termination  of  planning  is  also  in¬ 
fluenced  by  when  results  can  be  produced.  Most  tradi¬ 
tional  planners  have  small  sets  of  explicit,  well-defined 
goals,  and  a  precise  evaluation  function,  so  they  can 
plan  until  a  method  is  foimd  to  achieve  their  goals. 
Within  the  tactical  air  domain,  there  are  many  differ¬ 
ent  types  of  goab,  and  different  degrees  to  which  they 
can  be  achieved.  As  an  example,  if  an  aircraft  has  the 
mission  to  protect  its  aircraft  carrier,  it  may  produce 
the  goal  of  destro3dng  an  incoming  attau:k  aircraft.  Af¬ 
ter  the  engagement  has  proceeded,  the  agent  may  find 


itself  drifting  from  the  carrier  it  is  supposed  to  protect. 
At  this  point,  it  may  decide  that  it  has  completed  its 
mission  by  “scaring  oflT  the  threat,  without  actually 
destroying  it,  and  it  would  be  more  dangerous  to  con¬ 
tinue  than  to  return  to  its  patrol  position. 

The  combination  of  limited  reasoning  time  and  ill- 
defined  goals  provides  a  further  complexity  for  plan¬ 
ning.  The  question  is  how  far  the  planning  pro¬ 
cess  should  continue,  and  when  evaluation  should  take 
place. 

Solutions  for  Deliberative  Planning 

These  chareu:teristics  all  have  an  impact  on  how  plan¬ 
ning  can  occur  in  an  intelligent  agent.  Many  of  these 
issues  have  been  addressed  to  some  extent  in  previous 
research,  but  we  hope  to  build  an  integrated  system 
that  addresses  all  of  them.  This  section  describes  our 
preliminary  efforts  to  develop  an  integrated  planning 
solution  that  addresses  all  of  the  complexities  of  the 
domain.  It  begins  with  a  discussion  of  the  overall  in¬ 
tegrated  framework,  and  then  describes  specific  ideas 
for  each  of  the  planning  issues. 

Integrated  Planning,  Learning,  and 
Execution 

Our  commitment  to  an  integrated  system  began  with 
our  selection  of  the  Soar  architecture  (Laird,  Newell, 
&  Rosenbloom  1987)  as  the  platform  for  develop¬ 
ment.  Soar  provides  an  ideal  basis  for  recognition- 
driven  problem  solving,  and  naturally  supports  the  in¬ 
tegration  of  execution,  planning,  and  learning  (Laird 
ic  Rosenbloom  1990). 

Readers  familiar  with  Soar  will  recall  that  all  rea^ 
soning  and  behavior  generation  takes  place  in  problem 
spaces,  through  the  deliberate  selection  of  operators.  A 
fair  amount  of  research  on  traditional  planning  within 
Soar  (e.g.,  Lee  1994;  Rosenbloom,  Lee,  ic  Unruh  1992) 
also  orgamzes  planning  knowledge  as  sets  of  problem 
spaces.  Problem  spaces  are  collections  of  knowledge 
that  address  subgoals,  which  arise  in  response  to  a  lack 
of  knowledge  in  a  particular  situation.  A  typical  exam¬ 
ple  for  planning  occurs  when  an  agent  has  a  number  of 
candidate  actions  to  take,  but  does  not  have  the  knowl¬ 
edge  to  decide  between  them.  For  example,  a  pilot 
must  decide  which  type  of  weapon  to  employ  against 
a  target,  given  the  current  mission  and  circumstances 
of  the  environment.  After  planning  knowledge  (e.g., 
a  mental  simulation  of  the  alternatives)  suggests  an 
ordering,  the  automatic  learning  mechanism  summa¬ 
rizes  search  in  the  problem  space  into  individual  rules 
(“chunks”)  that  will  apply  in  future  similar  situations. 

We  should  stress  the  point  that  the  natural  rep¬ 
resentation  for  a  plan  within  TacAir-Soar  is  not  a 


declarative  script  of  actions.  Rather,  a  plan  is  a  col¬ 
lection  of  recognition-driven  rules  and  operators  that 
apply  opportunistically  in  response  to  particular  pat¬ 
terns  of  sensor  values,  interpretations,  and  goals.  Thus, 
in  a  sense,  Tac Air-Soar  will  never  be  learning  entire 
plans,  but  it  will  be  repairing  or  completing  the  general 
plan  composed  of  all  of  its  recognition  rules. 

Addressing  Domain  Issues 

This  integrated  framework  suggests  possible  solutions 
for  planning  that  also  address  the  issues  presented  ear¬ 
lier.  To  begin  with,  the  high  degree  of  interaction  be¬ 
tween  goals  suggests  criteria  for  both  triggering  and 
evaluating  new  plans.  Previously,  we  suggested  that 
planning  occurs  when  TacAir-Soar  does  not  have 
the  reactive  knowledge  necessary  to  choose  between 
competing  actions.  This  can  be  generalized  to  initiat¬ 
ing  planning  any  time  the  system  detects  an  interac¬ 
tion  between  goals  that  it  does  not  know  how  to  han¬ 
dle.  Covrigaru  (1992)  and  Lee  (1994)  have  investigated 
planning  methods  within  Soar  to  address  interactions 
between  different  types  of  goals.  Evaluation  of  poten¬ 
tial  plans  will  be  based  on  the  resolution  of  individual 
interactions — ^as  opposed  to,  for  example,  planning  ex¬ 
haustively  until  all  interactions  are  resolved.  As  the 
agent  develops  responses  to  individual  interactions,  it 
can  learn  partial  planning  results  in  the  form  of  new 
recognition  rules. 

These  partial  results  also  address  the  dynamic  char¬ 
acteristics  of  the  domain.  Such  planning  will  inte¬ 
grate  smoothly  with  normal  behavior  generation  be¬ 
cause  every  planning  episode  will  cause  the  system 
to  learn  something.  If  it  is  not  something  that  com¬ 
pletely  resolves  the  current  situation,  it  should  at  least 
allow  the  planning  process  to  resume  later  without  hav¬ 
ing  to  start  over.  Thus,  particular  planning  efforts 
can  be  temporarily  suspended  (or  perhaps  abandoned 
entirely)  without  having  been  a  total  waste  of  time. 
When  the  system  has  ample  time  to  plan  (such  as  in  a 
debriefing  session),  it  is  not  clear  whether  the  planning 
process  will  need  to  be  qualitatively  different.  Presum¬ 
ably,  the  ^stem  will  stiU  be  able  to  use  its  incremental 
planning  techniques,  but  generate  better  quality  plans 
because  it  has  more  time  to  evaluate  and  resolve  inter¬ 
actions. 

Also  in  response  to  to  the  dynamic  domain,  our 
initial  efforts  with  TacAir-Soar  have  addressed  the 
issue  of  integrating  planning  with  execution.  Many 
of  the  system^s  actions  can  apply  without  regard  for 
whether  the  system  is  currently  planning.  For  any  as¬ 
pects  of  the  current  situation  that  do  not  depend  on 
the  current  planning  activity,  the  system  continues  to 
generate  behavior  independent  of  other  processing. 


Because  of  TacAir-Soar’s  large  state  representa¬ 
tion,  we  have  adopted  high-level,  qualitative  descrip¬ 
tions  that  summarize  direct  sensor  readings,  thereby 
reducing  the  amount  of  information  that  must  be 
copied.  In  addition,  the  system  attempts  to  make  intel¬ 
ligent  decisions  about  the  portions  of  the  state  it  cares 
about.  These  decisions  are  based  on  a  static  analysis 
of  the  domain  knowledge,  as  well  as  dynamic  reasoning 
based  on  the  current  situation.  This  allows  the  system 
to  limit  the  amount  of  work  it  does  in  creating  a  mental 
copy  of  the  state,  which  has  been  our  primary  concern 
in  preliminary  work  on  planning. 

Our  hope  is  that  this  approach  will  also  aid  the  sys¬ 
tem  in  reasoning  in  an  uncertain  environment.  As  we 
have  discussed,  an  appropriate  response  to  this  issue 
is  to  generate  completable  plans.  In  TacAir-Soar^s 
terms,  we  wish  to  learn  new  rules  for  posting  general 
goals,  allowing  the  specific  situation  at  execution  time 
to  dictate  the  precise  actions  that  should  be  taken  to 
satisfy  those  goals.  Thus,  a  further  aim  for  setting  up 
a  mental  state  for  planning  is  to  abstract  away  details 
that  can  be  filled  in  by  the  situation  later. 

Finally,  the  criteria  for  terminating  the  planning  pro¬ 
cess  arise  in  part  from  the  solutions  we  have  already 
discussed.  If  there  is  time  to  plan  exhaustively,  the 
system  will  generate  solutions  for  all  the  goal  inter¬ 
actions  it  detects.  Because  the  system  returns  incre¬ 
mental  results  as  it  plans,  it  is  not  as  important  for 
it  to  determine  a  fixed  stopping  criterion.  If  planning 
must  be  suspended  temporarily,  the  partial  planning 
results  should  allow  planning  to  resume  &om  where 
it  left  off.  Finally,  as  we  have  mentioned,  the  system 
is  able  to  generate  behavior  simultaneously  with  plan¬ 
ning  in  many  situations,  so  planning  will  not  have  to 
be  interrupted  until  it  is  actually  finished. 

Summary 

Simulated  tactical  air  combat  is  an  ideal,  real  domain 
for  developing  and  testing  new  planning  methods.  The 
complexities  of  the  task  require  us  to  focus  on  a  num¬ 
ber  of  planning  issues  that  can  be  safely  ignored  in 
traditional  planning  domains.  Although  many  of  these 
issues  have  been  addressed  to  some  extent  in  the  plan¬ 
ning  literature,  we  plan  to  provide  an  integrated  so¬ 
lution  to  all  of  them.  We  have  begun  creating  a  sys¬ 
tem  that  smoothly  integrates  reactive  and  deliberative 
planning  within  the  recognition-driven  problem  solving 
framework.  Although  our  efforts  with  the  deliberative 
planning  component  are  young,  our  initial  experiences 
have  been  encouraging.  Hopefully,  the  complexities 
and  real-time  demands  of  the  tactical  air  domain  will 
lead  us  to  a  system  that  can  model  a  continuum  of 
planning  processes  from  purely  reactive  to  knowledge 


intensive  and  deliberate. 
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Introduction 

On  November  4-7,  1994,  the  Department  of  De¬ 
fense  held  an  operational  exerdse  called  STOW-E, 
involving  over  1,800  entities  in  a  virtual  battlefield, 
making  this  one  of  the  largest  applications  of  real 
time,  multi-agent  simulation.  The  participants  in¬ 
cluded  both  humans  (in  simulators  and  specially  in¬ 
strumented  vehicles)  and  computer  generated  forces, 
interacting  in  real-time,  unscripted,  realistic  engage¬ 
ments.  By  1997,  DOD  plans  to  hold  a  virtual  theater- 
level  war  involving  up  to  50,000  entities.  These  sim¬ 
ulations  provide  a  cost-effective  and  flexible  environ¬ 
ment  for  training,  mission  rehearsal,  and  tactics  devel¬ 
opment.  The  computer  forces  are  implemented  via  a 
spectrum  of  approaches,  from  aggregate  forces  gener¬ 
ated  by  wargames,  to  human  managed  semi-automated 
forces  (SAFORs),  to  completely  autonomous  intelli¬ 
gent  forces  (IFOfo).  The  computer  forces  dominate 
in  terms  of  sheer  numbers,  with  at  least  10  times  as 
many  computer  generated  forces  as  human  forces. 

Our  interest  is  in  the  development  of  IFORs,  com¬ 
puter  agents  with  the  ability  to  participate  folly  in  all 
aspects  of  the  simulated  battlefield.  The  Soar/IFOR 
consortium,  involving  the  University  of  Midiigan,  In¬ 
formation  Sciences  Institute  of  the  Univeraty  of  South¬ 
ern  California,  and  Cam^e  Mellon  University,  is  de- 
velopmg  ,IFO^  for  all  military  air  missions:  air  to 
air  combat,  air  to  ground  attacks,  air  supply,  anti¬ 
armor  attack,  etc.  IFORs  must  have  many  capabili¬ 
ties  to  be  successful:  real-time  reactivity,  goal-directed 
problem  solving,  planning,  large  bodies  of  knowledge, 
and  they  must  coordinate  their  behavior  with  other 
friendly  forces.  FVirthermore,  to  be  useful  and  effec¬ 
tive  in  training  and  tactics  development,  the  tactical 
behavior  of  our  agents  must  be  humanlike.  We  have 
demonstrated  the  feasibility  of  developing  IFOR  agents 
(Rosenbloom  et  al.  1994),  and  our  agents  fiilly  partici¬ 
pated  in  STOW-E,  flying  air-to-air,  ground  attack,  and 
surface  attack  missions  against  human  and  computer 

*Tius  research  was  supported  under  contract  N00014- 
92-K-2015  from  the  Advanced  Systems  Technology  Office 
of  the  Advanced  Research  Projects  Agency  and  the  Naval 
Research  Laboratory. 


generated  forces. 

Within  this  domain,  coordination  is  one  of  the  most 
important  determiners  of  success.  A  single  unit  has 
only  limited  ability  to  sense  its  environment  directly, 
and  has  only  limited  ways  in  which  it  can  act.  Through 
coordination  of  sensing,  multiple  agents  can  share  their 
knowledge  about  the  environment,  thus  making  their 
actions  far  more  effective.  Throtigh  coordination  of 
their  actions,  multiple  agents  can  avoid  conflicting  ac¬ 
tion  and  they  can  perform  actions  that  no  single  agent 
can  perform  alone,  such  as  mutual  defense.  The  prob¬ 
lem  is  how  to  get  many  different  agents,  in  different 
physical  locations,  with  different  models  of  the  envi¬ 
ronment,  with  different  physical  abilities,  and  pos^bly 
different  short-term  goals,  to  work  together  to  addeve 
thdr  common  long  term  goals. 

Previous  work  in  computer-generated  forces  (Calder 
et  al,  1993)  has  not  extendvely  modeled  the  coordi¬ 
nation  of  individual  forces.  In  the  majority  of  cases, 
either  an  omniscient  human  or  computer  agent  pro¬ 
vides  the  appearance  of  coordination  through  low-level 
monitoring  and  controlling  of  individual  agents.  When 
tight  coordination  of  behavior  of  a  small  unit  is  re¬ 
quired,  such  as  a  section  of  planes  flying  in  formation, 
the  aggregation  is  treated  as  a  single  unit.  Instead  of 
attempting  to  represent  the  communication  and  coor- 
^nation  of  the  individual  planes,  behavior  is  gener¬ 
ated  for  the  section  as  a  whole  and  then  spedalized  for 
the  individual  unit  (Rao  et  al.  1994).  Thus,  individ¬ 
ual  umts  are  not  faced  with  integrating  coordination 
activities  with  their  own  goals,  nor  do  they  need  to 
communicate  with  other  units. 

Our  approadi  is  straightforward.  We  model  the 
command  and  control  methods  currently  in  use  by  mil¬ 
itary  organizations.  Thus,  our  agents  directly  modd 
the  performance  of  humans:  there  is  a  one  to  one  map¬ 
ping  between  our  agents  and  humans.  Our  agents  have 
the  same  limits  in  perception  and  action  that  a  human 
would  have,  and  they  must  coordinate  their  behavior 
just  as  humans  do,  through  shared  knowledge  and  com¬ 
munication.  Some  of  the  advantages  of  this  approadi 
indude: , 

1.  Coordinated  behavior  is  more  realistic.  Our  agents 

coordinate  based  on  shared  doctrine,  shared  mis- 


sions,  and  explidt  communication.  Bxplidt  commu¬ 
nication  requires  time  to  transmit  and  interpret,  and 
is  open  to  mis-interpretation,  jamming,  etc.  By  in¬ 
dependently  modeling  each  entity  (instead  of  a  group 
as  a  whole),  our  agents  can  take  the  initiative  when 
appropriate. 

2.  Coor(hnated  behavior  should  be  easier  for  humans  to 
understand  because  there  is  explidt  communication 
to  monitor. 

3.  Coordination  is  possible  between  human  and  com¬ 
puter  forces  because  the  communication  is  modeled 
on  human  communication. 

In  building  our  agents,  we  discovered  that  some 
of  the  issues  that  have  plagued  previous  researdi  in 
multi-agent  coordination  do  not  arise  in  this  domain. 
First,  coordination  is  possible  without  the  addition 
of  spedal  purpose  ^architectural’’  capabilities  (such 
as  the  generation  and  transmission  of  partial-^obal 
plans  (Durfee  &  Lesser  1987;  Durfee  1988)).  An  archi¬ 
tecture  designed  to  support  general  intelligent  agents 

—  such  as  Soar  (Ldrd,  Newell,  &  Rosenbloom  1987) 

—  appears  to  be  suffident.  Coordination  does  re- 
qtdre  large  bodies  of  knowledge  and  inference  (it  is  a 
Imowledge-levd  capability  (Huffman,  Miller,  &  Laird 
1993)),  but  these  need  not  be  specialized  except  in  con¬ 
tent.  Second,  our  agents  do  not  need  to  carry  out  pro¬ 
tracted  negotiations  (Rosensdidn  1993;  Smith  1980; 
Sycara  1989),  Reason  about  the  processes  of  coordi¬ 
nation  among  the  agents” (Bond  &  Gasser  1988),  or 
dynamically  construct  complex  models  of  those  agents 
(Ephrati  &  Rosensdidn  1992).  Because  our  agents  are 
designed  to  work  within  the  military’s  well  establish 
hierarchy  of  command,  control,  and  commimication, 
and  because  our  agents  are  “experts”  for  their  tasks, 
negotiation,  runtime  reasoning,  and  complex  internal 
models  of  friendly  agents  can  be  “compil^”  out,  with 
just  the  knowledge  of  how  and  when  to  coordinate  re¬ 
maining.  Our  agents  require  only  very  limited  informa¬ 
tion  about  other  agents,  such  as  locations,  call  signs, 
radio  frequendes,  and  positions  within  the  command 
hierarchy. 

The  goal  of  this  paper  is  to  demonstrate  the  suffi- 
dency  of  our  simplier  approach  for  a  real-world  appli¬ 
cation,  and  to  anal3rze  the  complexity  of  coordination 
required  in  this  domain.^  We  be^n  by  presenting  a 
scenario  that  illustrates  the  coordination  required  in 
this  domain.  We  then  analyze  the  required  coordina¬ 
tion  along  three  dimensions:  the  organizations  of  the 
agents,  the  type  of  activities  that  are  coordinated,  and 
finally  the  sources  of  knowledge  that  support  coordina¬ 
tion.  Next  we  identify  the  general  capabilities  that  are 
required  to  support  coordination  and  how  they  are  re¬ 
alized  in  our  underlying  architecture  of  choice  (Soar). 
We  condude  with  a  discussion  of  the  limits  our  ap¬ 
proach. 


^This  is  an  extension  of  our  earlier  work  on  air-to-air 
coordination  (Laird,  Jones,  &  Nielsen  1994). 


Example  Scenario 

Oiu*  agents  indude  pilots  of  fighter  and  air-to-ground 
attack  planes,  and  a  variety  of  controllers  that  provide 
mission  and  routing  information  to  the  planes.  We  at¬ 
tempt  to  realistically  model  current  military  doctrine 
and  tactics.  Our  sources  indude  unclassified  military 
documents,  books,  extensive  interviews  with  former 
U.S.  Navy  pilots,  and  observations  of  U.S.  Navy  phots 
training  in  real  aircraft  and  in  military  flight  simula¬ 
tors.  Our  agents  are  built  within  TacAir-Soar  (Rosen¬ 
bloom  et  al.  1994),  our  generic  name  for  agents  that 
fly  simulated  fixed-wing  aircraft  developed  within  Soar. 
Our  agents’  simulation  environment  is  based  on  the 
DIS  protocol  (steering  conunittee  1994).  The  agents 
interact  with  the  DIS  world  through  ModSAF  (Calder 
et  al.  1993),  which  provides  simulations  of  vehide  dy¬ 
namics,  sensors,  and  weapons.  DIS  (and  ModSAF) 
support  distributed,  interactive,  real-time  simulation 
for  groimd,  surface,  and  air  entities.  For  example,  in 
STOW-E,  our  planes  engaged  both  huznans  in  simu¬ 
lators  and  SAFOR  computer  generated  forces.  Oiu: 
planes  fired  simulated  exocet  missiles  at  a  real  ship 
(the  Hue  Cily)  that  was  partidpating  in  the  simula¬ 
tion  through  spedal  instrumentation,  bombed  a  virtual 
bridge,  shot  down  humans  in  simulators,  and  were  shot 
down  by  humans  in  simulators  and  once  by  a  virtual 
surface-to-air  missile.  Eadi  of  our  agents  is  an  inde¬ 
pendent  Soar  S3rstem  situated  in  its  own  virtual  vehi¬ 
de  (such  as  an  F-18),  and  is  restricted  to  percdving 
what  would  be  available  to  a  human  in  such  a  vehi¬ 
de  (via  radar  and  vision).  Communication  between 
agents  takes  place  via  simulated  radios  using  messages 
that  approximate  the  messages  sent  by  humans. 
Consider  the  scenario  in  Figure  1  in  which  two 
fighter  planes  (F-18’s)  are  flying  as  a  section  on  an  air- 
to-ground  mission.  This  is  similar  to  a  mission  flown  by 
Soar  agents  during  STOW-E,  but  has  been  expanded 
for  expositional  piirposes  to  indude  a  broader  variety 
of  coordination  types  (all  of  which  are  implemented  in 
our  agents).  The  ori^^  goal  of  the  nu^on  is  to  bomb 
Target  1.  Once  the  planes  are  airborne,  th^  join  up 
into  a  prebriefed  formation  (at  time  1),  and  start  on 
thdr  prebriefed  flight  plan  (Elmer  to  Cougar  to  Wanda 
to  Target  1).  The  lead  (FI)  controls  the  section  and 
makes  all  section-levd  mission  decisions,  as  well  as  fly¬ 
ing  his  own  plane.  The  wingman  (F2)  flies  so  as  to 
maintdn  the  current  formation. 

While  flying  their  route,  the  lead  diecks  in  with  a 
controller  (the  TACC)  to  receive  permission  to  enter 
the  combat  area,  and  to  recdve  possible  changes  in 
routing  information.  The  controller  may  request  au¬ 
thentication  to  verify  that  the  plane  is  friendly.  The 
job  of  the  controller  is  to  verify  that  planes  are  where 
th^  bdong,  to  perform  air  traffic  control  to  avoid  col¬ 
lisions  (usually  by  assigning  different  routes  and  alti¬ 
tudes),  and  to  rday  commands  from  other  command 
entities. 

Let  us  assume  that  an  E-2C  (at  time  2)  informs  the 
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planes  of  an  approadung  threat  (the  MiG-23’s).  Usu¬ 
ally  there  would  be  other  planes  available  to  deal  with 
the  MiG’s,  but  this  allows  us  to  illustrate  some  im¬ 
portant  types  of  coordination.  The  llCG-23’s  must  de¬ 
pend  on  a  ground  controller  (GCI),  which  a  irw^rfi 
more  powerful  radar  to  guide  them  toward  the  engage- 
mait  by  giving  them  bearing  and  range  information  of 
tile  F-18’s.  Similarly,  the  P-18’s  will  get  information 
about  the  MiG’s  from  the  E-2C.  However,  once  the  F- 
18’s  have  the  MiG’s  on  their  own  radar,  and  they  have 
been  cleared  to  engage  by  thdr  controller,  th^  will 
prosecute  the  engagement  on  thdr  own,  receiving  in¬ 
formation  from  the  controller  only  when  they  request 
it  (sudi  as  if  they  lose  radar  contact). 

During  the  engagement,  the  lead  of  the  F-lS’s  com- 
mumcates  to  the  wingman  to  change  formation  to  one 
that  provides  more  mutual  support,  and  to  ‘tort”  the 
MiG’s  so  that  they  each  have  a  separate  target*  Simi¬ 
larly  the  MiG’s  communicate  to  perform  their  tactical 
maneuver  (called  a  pincer)*  In  this  eng^^n^pint^ 
assume  that  both  hCG’s  are  destroyed  (at  time  3).  In 
general,  the  F-18’s  would  jettison  their  bombs  prior  to 
the  engagement  to  increase  thdr  maneuvering  ability, 
but  we  will  assume  that  t  h^  did  not  and  continue  their 
ground  attack  mission. 

Once  the  F-18’s  destroy  the  MiG’s,  th^  head  bade 
to  their  next  TOypoint  (Cougar).  At  about  this  time, 
the  forward  air  controller  (FAC)  comes  under  attack 
by  enemy  tanks.  The  FAC  calls  to  another  controller 
(the  TAD)  and  requests  an  air  strike.  The  TAD  con¬ 
tacts  the  F-18’s  (at  time  4)  and  gives  them  a  new  mis¬ 
sion.  The  lead  of  the  F-18’s  must  then  plan  thdr  fin^i 
attack  altitude  and  geometry  from  the  “initial  point” 
(Wanda)  to  the  target  and  communicate  it  to  the  wing- 
man.  As  they  approach  the  initial  point  (wanda),  the 


lead  communicates  the  mission  to  the  FAC  to  verify  the 
target,  etc.  Once  the  FAC  verifies  the  mission  and  vi¬ 
sually  sites  the  planes,  the  FA.C  sends  them  a  “cleared 
hot”  message  to  attack  the  target.  The  F-18’s  perform 
a  90-10  maneuver  (planned  earlier  the  lead)  to  pro¬ 
vide  separation  during  the  final  bombing  run.  Since 
the  tanks  are  moving,  the  F-18’s  must  visually  acqiure 
them,  modify  their  approach,  and  drop  their  bombs 
(at  t^e  5).  Th^  then  exit  the  attack  area  and  fly 
back  on  their  egress  route  (not  shown). 

Although,  our  agents  ei^ody  all  of  the  reasoning 
and  communication  required  in  thin  scenario,  they  do 
not  embody  all  that  humans  use  in  the  complete  range 
of  tir-to-air  and  air-to-ground  missions.  For  example, 
our  planes  only  fly  in  groups  of  two  (sections),  not  in 
groups  of  three  or  four  (divirions).  Also,  a  forward  air 
controller  can  mark  a  target  for  a  plane  by  using  a  flare, 
a  beacon,  or  a  laser.  In  addition,  our  E-2C  agents  do 
not  direct  planes  to  specific  air  ta^ets.  Currently  thqr 
only  provide  contact  information  that  oiur  agents  use 
to  make  their  own  decirions.  We  plan  to  implement  aU 
these  types  of  coordination  in  the  near  future. 

Coordination  Analysis 
The  purpose  of  this  analysis  is  to  demonstrate  the  di¬ 
versity  of  coordination  being  performed  by  our 
across  a  variety  of  ^menrions. 

Coordination  Organization 

The  previous  example  illustrates  the  three  organiza¬ 
tional  structures  of  coordination  used  by  the  military 
and  our  agents.  In  all  cases,  a  section  or  a  plane  is  con- 
troU^  by  another  entity  (lead,  or  controller),  but  the 
individual  agents  still  have  significant  autonomy  and 
responsibility  for  their  own  actions.  The  command 
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structure  is  relatively  static,  and  a  section  is  in  con¬ 
tact  with  only  a  single  controller  at  a  ^ven  time.  The 
current  controller  for  a  section  is  determined  by  the 
mission  briefing,  or  by  explicit  communication  firom 
another  controller. 

•  Master  slave:  The  lead  dictates  the  actions  of  the 
section,  but  the  wingman  still  decides  how  best 
achieve  and  stay  in  formation.  The  wingman  can 
become  lead  if  he  has  better  situational  awareness 
or  weapons  capability. 

•  Centralized:  The  GCI  and  E-2C  (as  well  as  the 
TACC  and  TAD)  can  provide  information  and  con¬ 
trol  for  many  sections  of  planes. 

•  Distributed:  The  TACC,  TAD,  and  FAC  form  a  dis¬ 
tributed  control  network  in  which  requests  for  mis¬ 
sions  are  propagated  through  the  network  and  as¬ 
signed  to  sections.  The  controllers  coordinate  the 
activities  of  multiple  fighters  by  routing  them,  as¬ 
signing  altitudes,  communication  frequencies,  and 
attack  times. 

Types  of  Coordination 

Within  the  different  coordination  organizations  listed 
above,  the  agents  coordinate  a  variety  of  different  ac¬ 
tivities.  The  left  half  of  Figure  2  lists  the  types  of 
coordination  found  between  the  lead  and  wingman  in 
terms  of  coordinated  action,  sensmg,  missions,  and  sec¬ 
tion  organization.  This  organization  is  not  represented 
explicitly  within  the  agents.  Most  of  the  coordination 
occurs  in  terms  of  action:  flying  in  formation  and  em¬ 
ploying  thdr  weapons.  They  coordinate  in  sensing,  by 
directing  thdr  radars  so  that  th^  are  not  completely 
overlapping.  They  explidtly  communicate  radar  and 
visual  sightings.  Th^  also  coordinate  their  execu¬ 
tion  of  thdr  mission,  and  the  lead  will  communicate 
changes  to  the  mission,  current  progress  in  the  mis- 
don,  and  intent,  such  as  the  dedsion  to  intercept  en¬ 
emy  planes.  Finally,  th^  coordinate  the  organization 
of  the  section. 

The  right  half  of  Figure  2  shows  the  types  of  co¬ 
ordination  found  between  a  section  and  a  controller. 
Here  there  is  no  coordination  of  their  joint  actions 
(although  the  mission  coordination  provides  indirect 
coordination  of  the  section  with  other  planes).  The 
coordinated  sensing  is  in  similar  spirit  to  the  coordi¬ 
nation  within  a  section,  although  in  this  case  the  con- 
troUers  have  much  better  radar  capabilities.  The  most 
involved  coordination  comes  under  the  mission  head¬ 
ing,  where  the  controllers  can  change  almost  any  as¬ 
pect  of  the  mission  for  a  section,  induding  the  altitude 
for  flying  routes,  the  routes,  the  controllers  the  section 
contacts  along  the  route,  the  radio  frequency  to  use 
during  the  contact,  the  target  location,  and  the  time 
of  the  attadc.  Because  the  timing  of  an  attack  is  criti¬ 
cal  (e.g.  it  may  be  coordinated  with  the  ending  of  an 
artillery  barrage),  the  controllers  also  provide  a  time 
hack^  to  synchronize  everyone’s  watdies.  The  need  to 
atta^  at  a  spedfic  time  (+/-  ten  seconds),  forces  the 


planes  to  adjust  their  speed  dynamically  or  even  go 
into  holding  patterns. 

Basis  of  Coordination 

In  this  domain,  the  key  to  coordination  is  knowl¬ 
edge.  The  agents  must  know  the  appropriate  tech¬ 
niques  and  methods  for  performing  their  spedfic  tasks, 
such  as  maneuvering,  sensing,  and  employing  thdr  own 
weapons.  They  must  know  their  responsibilities  for 
their  current  mission,  the  details  of  that  misdon,  and 
who  the  other  agents  are  that  they  must  interact  with. 
Th^  also  must  known  in  general  terms  when  and  what 
to  communicate  to  which  agents  during  the  mission, 
and  what  to  do  in  response  to  messages  from  others. 
We  have  identified  four  different  sources  of  coordina¬ 
tion  knowledge. 

Background  Knowledge:  Common  Doctrine 
and  Tactics.  Most  of  the  long-term  knowledge  in 
our  agents  consists  of  knowledge  about  how  to  perform 
thdr  missions.  This  indudes  how  to  maneuver,  sense 
and  employ  weapons,  but  it  also  indudes  doctrine  and 
tactics  which  spedfy  methods  and  procedures  for  co¬ 
ordinating  with  other  agents.  This  doctrine  indudes 
specific  roles  for  individuals  (such  as  lead,  wingman, 
TAG,  TADD,  FAC)  and  the  specific  duties  to  be  per¬ 
formed.  Thus,  there  is  no  need  for  the  lead  and  wing- 
man  to  negotiate  how  to  maintmn  the  formation.  The 
wingman  just  does  it.  This  is  a  socuU  contruct,  where 
agents  implidtly  create  coordinated  behavior  by  be¬ 
having  according  to  certain  prespecified  rules  (Shoham 
&  Tennenholtz  1992). 

The  reliance  on  common  background  knowledge  to 
support  coordination  in  the  military  is  not  surprising. 
The  military  has  suffident  planning  and  trsdning  time 
to  develop  and  implement  common  doctrine  and  tac¬ 
tics.  The  individual  agents  need  not  determine  the 
best  coordination  strategy  on  thdr  own,  but  can  rdy 
on  compiled  versions  of  the  coordination  strategy. 

Mission  Briefing.  Before  a  misdoh,  the  partid- 
pants  are  briefed  on  the  tactical  situation  (sudi  as 
weather  and  enemy  activity),  thdr  responsibilities,  and 
the  responsibilities  of  others.  The  briefing  hdps  estab¬ 
lish  spedfic  operational  parameters  required  for  coor¬ 
dination,  such  as  the  partners  of  a  section,  their  ini¬ 
tial  formations,  the  methods  for  communication  (radio 
frequendes,  call  signs),  the  default  radar  contract,  the 
default  method  for  sorting  enemy  planes,  any  spedfic 
tactics  the  section  plans  to  employ,  the  waypoints  of 
the  mission,  the  controllers  who  will  be  contacted  diu:- 
ing  the  mission,  the  authentication  procedures,  and  so 
on.  Based  on  the  misdon  briefing,  the  lead  will  fill 
in  any  detdls,  such  as  an  attack  plan,  based  on  the 
mission  and  the  tactical  situation.  In  our  agents,  the 
mission  briefing  knowledge  is  not  "compiled”  into  the 
agent,  because  it  changes  from  mission  to  mission.  It 
is  a  set  of  parameters  that  are  loaded  into  our  agents 
before  they  start  their  missions. 
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Figure  2:  Types  of  Coordination  Between  T^cAir-Soar  Agents. 


Observed  Behavior*  During  a  mission,  the  mem¬ 
bers  of  a  section  can  directly  observe  each  other’s  be¬ 
havior*  Thus,  behavior  alone  can  be  a  ingnal  for  co¬ 
ordination,  as  when  a  lead  makes  a  miall  turn.  In 
ThcAir-Soar,  the  only  use  of  cooidination  through  ob¬ 
servation  is  when  the  wing  responds  to  gmaH  turns  of 
the  lead.  This  will  be  expanded  when  our  agents  need 
to  fly  without  radio  communication  because  of  jam¬ 
ming  or  the  danger  of  detection. 

Explicit  Communication.  The  most  flexible  way 
to  coordinate  behavior  is  to  explicitly  rnmnniinir^^fj^ 
knowledge  and  goals  between  two  agents.  In  thla  do¬ 
main,  it  is  via  radio.  We  have  attempted  to  r^licate 
the  communication  used  by  humans  for  the  missions 
performed  by  our  agents.  There  are  approximately  70 
message  templates  that  our  agents  can  send  and  re¬ 
ceive.  These  message  templates  ^proximate  the  lan¬ 
guage  used  by  naval  aviators  and  are  ea^y  understand¬ 
able  by  humans. 

Coordination  Capabilities 

In  this  section,  we  summarize  the  cognitive  opabil- 
ities  required  to  support  coordination  in  otur  agents. 
This  is  based  on  types  of  coordinated  behavior  (act¬ 
ing,  sen^ng,  mission,  etc.),  and  the  methods  for  shar¬ 
ing  knowledge  and  goals.  These  capabilities  serve  as 
a  requirements  list  for  constructing  an  agent  that 
coordinate  with  others  in  domains  such  as  tactical  air 
combat.  For  each  capability,  we  describe  briefly  how  it 
is  implemented  in  TacAir-Soar. 

Extensive  Knowledge  Base.  Our  approach  relies 
heavily  on  the  fact  that  the  individual  agents  are  ex¬ 
perts  at  performing  thdr  missions  and  interacting  with 
others.  Eadi  agent  must  have  an  extensive  knowledge 


base  that  includes  all  of  the  tactics  and  doctrine  ap¬ 
plicable  to  its  possible  roles  in  the  missions  in  which  it 
will  partidpate.  For  example,  a  wingman  must  have 
the  same  knowledge  of  doctrine  and  tactics  as  the  lead, 
so  that  the  wingman  can  take  over  when  necessary. 

ThcAir-Soar’s  knowledge  is  encoded  as  rules.  Our 
attack  aircraft  have  over  2600  rules,  while  our  ground- 
based  controllers  have  over  1800  rules.  The  doctrine 
and  tactics  are  encoded  within  a  hierarchy  of  inter¬ 
twined  goals  that  are  dynamically  instantiated  based 
on  the  current  situation  and  mission. 

Parameter-driven  Behavior.  An  agent  must  not 
be  limited  to  only  one  type  of  behavior,  but  must  be 
able  to  perform  a  variety  of  activities  in  coordination 
with  others.  In  our  agents,  the  misdon  briefing  re¬ 
ceived  before  launch  and  the  mission  changes  received 
from  controllers  dynamically  determine  the  goals  of  our 
agents.  The  agent’s  bdiavior  must  be  parameterized 
so  that  the  knowledge  relevant  to  the  current  misdon  is 
used.  These  may  sound  trivial,  but  for  some  complex 
misdons,  the  information  in  the  briefing  may  involve 
fragments  of  plans  that  the  agent  must  integrate  into 
its  overall  behavior  at  the  appropriate  times.  Thus, 
the  generators  of  the  agent’s  behavior  must  be  flexible 
enou^  so  that  th^  can  be  modified  at  any  time. 

In  TacAJr-Soar,  all  misdon-related  bdxavior  is  based 
on  a  representation  of  the  current  misdon  that  is  held 
in  a  working  memory.  This  can  be  examined  by  the 
rules  that  make  up  its  long-term  knowledge.  The  nus- 
sion  is  specified  at  briefing  time,  but  also  can  be  dy¬ 
namically  changed  later. 

Reactive  Execution  and  Interruptible  Process¬ 
ing.  A  wingman  must  respond  quiddy  to  changes  in 
the  lead’s  behavior.  Computer  generated  forces  must 
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in  general  be  reactive,  but  coordination  also  requires 
that  they  can  interrupt  thdr  current  goals  to  process 
and  respond  to  an  urgent  message. 

In  TacAir-Soar,  the  wingman^s  main  goal  is  to  fly  in 
formation  with  the  lead.  Whenever  the  wingman  is  out 
of  position,  rules  fire  to  propose  operators  to  modify 
the  heading,  speed,  or  altitude.  Whenever  the  wing- 
man  rec^ves  radio  messages  from  the  lead,  rules  fire 
to  propose  operators  which  in  turn  perform  an  action 
appropriate  to  the  message  in  the  current  situation. 

Generate  and  Comprehend  Messages.  In  order 
to  communicate  with  other  agents,  an  agent  must  be 
able  to  translate  its  internal  information  about  its 
goals,  its  perception  of  the  world,  and  its  current  ac¬ 
tions,  into  a  form  that  can  be  understood  by  other 
agents.  The  converse  is  translating  messages  from 
other  agents  into  an  internal  representation  that  the 
agent  can  work  with.  The  general  solution  to  both 
reqiures  full  natural  language. 

In  the  current  version  of  TacAir-Soar,  we  finesse  the 
general  problem  and  use  a  template^based  approadi 
where  we  prespedfy  the  form  of  the  messages  that 
the  ^stem  can  generate  and  accept.  Our  agents  know 
when  to  generate  these  messages.  They  also  know  how 
to  interpret  these  messages  and  modify  thdr  own  inter¬ 
nal  knowledge  structures  appropriately.  Thus,  weVe 
implemented  the  types  of  communication  required  by 
our  agents  but  gone  no  further.  However,  the  human 
interactions  themselves  are  very  stylized,  with  a  strong 
empha^  on  encoding  information  into  short  phrases 
whenever  possible.  For  example,  a  pilot  might  send 
**bogey  dope”  to  a  ground  controller,  which  is  request 
for  information  on  the  current  bogey  that  the  pilot  is 
enga^ng.  This  approadi  has  been  successful  for  the 
types  of  communication  our  agents  need  to  produce, 
and  is  natural  enough  so  that  humans  can  fly  as  lead 
or  wingman  with  our  agents  using  a  simulation  inter¬ 
face  with  menu-driven  communication  that  approxi¬ 
mates  the  cockpit  of  an  F-14,  kGG-29,  or  E-2C  (van 
Lent  &  Wray  1994).  The  human  communicates  with 
our  agents  through  a  menu-driven  interface,  and  the 
messages  from  our  agents  are  imderstandable  to  the 
human.  However,  this  approadi  will  break  down  when 
extended  to  unrestricted  natural  language  interactions 
with  real  pilots.  To  that  end,  we  are  investigating  gen¬ 
eral  natural  language  approaches  (Rubinoff  &  Lehman 
1994). 


(when  viewed  over  time).  Thus,  our  agents  do  very 
well  as  long  as  the  situation  is  covered  by  some  com¬ 
bination  of  existing  military  practice  (which  indudes 
defining  new  missions  and  many  types  of  changes  to 
the  organizational  structure).  In  completely  novel  sit¬ 
uations,  our  agents  will  use  whatever  pieces  of  doctrine 
that  are  relevant  to  the  situation.  However,  our  agents 
do  not  have  the  ability  to  step  back  and  reason  from 
first  prindples  about  what  would  be  a  new,  possibly 
novd  coordinated  response  to  the  situation  (although 
this  is  one  of  our  research  areas). 

The  long  term  goal  of  our  work  is  to  build  intelligent 
autonomous  agents.  In  this  paper,  we  have  demon¬ 
strated  that  it  is  possible  to  create  agents  for  a  complex 
environment  in  which  coordination  is  critical.  Om:  ap¬ 
proadi  has  been  straightforward.  We  try  to  modd  the 
coordination  methods  used  by  humans,  and  to  date, 
we  have  implemented  coordination  without  n^otia- 
tion,  extensive  internal  agent  modeling,  or  spec^  ar¬ 
chitectural  mechanisms.  The  coordination  arises  out 
of  shared  doctrine  and  tactics,  shared  knowledge  of 
misdons,  observations  of  behavior,  and  explidt  com¬ 
munication.  Our  success  is  heavily  dependent  on  four 
characteristics  of  our  domain  which  simplified  the  im¬ 
plementation  of  coordination:  the  shared  goals  of  the 
agents,  the  expert-level  performance  (and  knowledge) 
of  the  agents,  the  well-defined  methods  and  procedinres 
of  the  military  that  we  are  modeling,  and  the  avdlabil- 
ity  of  experts  that  are  willing  and  able  to  provide  the 
details  of  procedures. 

In  the  near  future  we  will  be  extending  our  agents 
to  all  military  air  missions,  induding  helicopters,  joint 
mission  with  groimd  forces,  and  large  scale  coordinated 
strikes  (involves  20-30  aircraft  at  once).  These  new 
missions  with  allow  us  to  evaluate  the  suffidency  of  our 
approach  in  an  even  more  complex  and  “coordination- 
rich”  domain. 


Discussion 

On  the  surface,  our  approadi  might  appear  to  suffer 
from  rigidity  because  it  depends  on  a  set  of  “canned” 
interactions  based  on  existing  doctrine  and  tactics. 
However,  our  agents  are  not  blindly  applying  a  fixed 
doctrine  independent  of  changes  in  the  environment. 
Instead,  our  agents  are  continually  reassessing  the  sit¬ 
uation,  dynamically  stringing  together  bits  and  pieces 
of  existing  doctrine  and  tactics  that  are  appropriate 
to  each  situation,  possibly  generating  novd  behavior 
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Abstract 

In  support  of  the  Soar/IFOR  project’s 
goal  of  providing  intelligent  forces  for  dis¬ 
tributed  interactive  simulation  environ¬ 
ments  [Laird  ei  al.,  1995],  the  NL-Soar 
project  works  toward  the  implementation 
of  a  full  natiural  language  capability  for 
Air-IFOR  agents.  In  this  paper  we  dis¬ 
cuss  the  design  of  that  language  capa^ 
bility  (NL-Soar)  and  its  integration  into 
TacAir-Soar  agents.  In  particular,  we 
demonstrate  how  NL-Soar ’s  linear  com¬ 
plexity,  interruptibility,  and  atomaticity 
of  language  processing  provide  language 
comprehension  and  generation  processes 
that  do  not  compromise  agent  reactivity. 


1  Introduction 

Autonomous  intelligent  forces  (IFORs)  play  an 
increasingly  critical  role  in  both  large-scale  dis¬ 
tributed  sunulations  and  small-scale,  focused 
training  exercises.  An  IFOR  is  a  complex  agent 
that  requires  diverse  capabilities  to  perform  at  a 
usefiil  level  of  functionality.  Since  an  IFOR’s  role 
will  often  be  to  replace  one  or  more  individuals  in 
an  engagement,  the  ability  to  communicate  in  nat- 
urd  language  can  be  a  key  capability  contributing 
to  its  overall  performance.  An  agent  that  is  rigid 
in  its  communicative  ability  may  introduce  a  brit¬ 
tleness  into  the  simulation  (i.e.  a  tendency  to  fail 
in  unexpected  ways)  that  has  nothing  to  do  with 
imperfections  in  strategic  or  tactical  knowledge. 
Thus,  in  building  TacAir-Soar  agents  to  partici¬ 
pate  in  beyond-visual-range  combat  [Laird  ei  aL, 
1995],  an  NL  capability  is  needed  to  ensure  reac¬ 
tive,  human-like  performance  in  basic  interactions 
among  pilot,  wing,  and  air  intercept  control  (AIC). 

In  [Rubinoif  and  Lehman,  1994a]  we  identi¬ 
fied^  three  main  characteristics  of  communication 
during  air  combat  that  present  challenging  ar¬ 
eas  of  research:  (1)  it  occurs  in  real-time,  (2)  it 
must  seamlessly  integrate  with  the  agent’s  non- 
linguistic  capabilities,  e.g.  perception,  planning, 
reasoning  about  the  task,  and  (3)  its  content  must 
be  comprehended  and  generated  in  accordance 


with  performance  data,  i.e.  with  all  of  the  idiosyn¬ 
cratic  (instructions,  ungrammaticalities,  and  self¬ 
corrections  found  in  real  language.  Within  the 
context  of  these  research  issues,  we  introduced 
N^Soar,  a  Iwguage  comprehension  and  gener¬ 
ation  capability  designed  to  provide  integrated, 
re^-tiine  natural  language  processing,  for  systems 
built  within  the  Soar  architecture  [Lewis,  1993; 
Nelson  ei  aL,  1994a;  Nelson  ei  al,  1994b;  Rubi¬ 
noff  and  l^hman,  1994b].  In  this  paper  we  concen¬ 
trate  on  is^es  (1)  and  (2),  coloring  our  progress 
toward  their  solution  using  NL-Soar  in  Soar-based 
Air-IFOR  agents. 

2  Demands  of  reactivity 

The  naive  approach  to  communication  between 
agents,  and  the  one  available  using  off-the-shelf 
technology,  treats  language  as  fix)nt-end  and  back¬ 
end  interfaces.  Messages  are  comprehended  by 
a  firont-end  module,  which  creates  a  system- 
dependent  representation  of  the  message  that  can 
be  used  by  the  other  modules  responsible  for  the 
agent’s  behavior.  Similarly,  when  an  agent  needs 
to  send  a  message,  that  same  representation  is 
passed  to  a  back-end  module  that  generates  an 
output  message  to  be  directed  to  other  agents.^ 
This  makes  language  an  all-or-nothing  en¬ 
deavor,  the  imphcations  of  which  can  be  seen  in 
Figure  1.  In  this  typical  tactical  air  scenario,  blue 
is  flying  an  intercept  (1)  and  is  actively  pursu¬ 
ing  the  goal  of  achieving  its  launch  acceptability 
region  (LAR)  when  an  incoming  message  arrives 
(2).  The  message  is  buffered  imtil  the  current  goal 
is  achieved  and  blue  has  fired  a  missile  (3).  Next, 
processing  of  the  input  begins  (4);  it  ends  some¬ 
time  after  red  has  returned  fire  (5)  and  (6).  Only 
after  the  communiation  has  been  understood  can 
blue  begin  its  evasive  maneuver  (7). 

It  is  clear  that  reactivity  is  compromised  if  un¬ 
derstanding  must  be  postponed  until  the  current 

*The  approach  being  described  here  does  not  de¬ 
pend  in  any  way  on  the  content  of  the  message  or 
the  style  of  language  accepted  and  generated.  Thus 
it  would  apply  equally  whether  the  language  passed  is 
natural  language  or  a  formal  communication  protocol 
(such  as  CCSIL  [Salisbury,  1995]). 
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Figure  1:  All-or-nothing:  a  comnninication  model  that  compromises  reactivity 
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Figure  2:  Reactive  communication:  interleaving  comm  and  non-conun  subtasks 
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goal  has  been  accomplished,  and  then  is  pursued 
to  the  exclusion  of  all  else.  In  particular,  two  cases 
are  cause  for  concern.  Consider  first  what  hap¬ 
pens  at  (2)  if  the  content  of  the  message  is  rele¬ 
vant  to  the  situation  at  the  time  it  is  received.  In 
this  case,  buffering  the  message  leads,  at  best,  to 
wasted  processing  in  the  future  (when  the  message 
has  become  obsolete).  At  worst  buffering  compro¬ 
mises  the  decision  making  of  the  agent  by  preclud¬ 
ing  access  to  timely,  necessary  information.  To  re¬ 
move  this  possibility,  we  could  modify  the  control 
of  the  agent  to  always  attend  to  communication 
needs  first.  But  this  would  simply  put  us  in  the 
second  problematic  situation  more  often. 

In  this  second  case  (4),  if  the  content  of  the  mes¬ 
sage  is  not  critical,  devoting  processing  to  it  rather 
than  other  things  can  compromise  the  agent’s  re¬ 
activity  as  well.  In  short,  shutting  out  either  com¬ 
munication  processes  or  non-communication  pro¬ 
cesses  can  be  equally  dangerous.  The  point,  of 
course,  is  that  you  can’t  tell  which  situation  you 
will  be  in  until  you  process  the  message,  at  which 
time  it  is  too  late  to  change  your  mind.^ 


^Dedicating  a  separate,  parallel  process  to  commu¬ 
nication  might  ameliorate  the  problem  but  won’t  nec¬ 
essarily  solve  it.  A  separate  process  will  be  able  to 
comprehend  or  generate  the  message  whOe  the  agent 


Figure  2  gives  a  more  desirable  version  of  the 
same  task  events.  Again,  the  pilot  is  fiying  an  in¬ 
tercept  (1),  trying  to  achieve  firing  position  when 
a  message  arrives  (2).  The  message  is  attended  to 
immediately,  its  processing  interleaved  with  the 
ongoing  effort  to  achieve  LAR  (3).  In  this  exam¬ 
ple,  the  message  is  completely  processed  by  the 
time  the  pilot  is  in  a  position  to  fire  (4),  and  eva^ 
sive  maneuvers  can  be  started  immediately,  well 
before  red  returns  fire. 

The  model  in  Figure  2  overcomes  the  problems 
in  the  simpler  model  of  Figure  1  by  intertwining 
the  different  strands  of  agent  behavior  at  the  sub- 


is  performing  other  tasks,  but  will  have  to  work  in 
isolation,  i.e.  cut  oif  from  the  changing  situation  and 
goals  of  the  agent.  To  the  extent  that  there  is  relevant 
information  that  is  unavailable  during  communication 
processing,  the  agent  may  formulate  interpretations  or 
communications  that  are  inappropriate  or  out  of  83mc. 
To  the  extent  that  the  relevant  information  is  commu¬ 
nicated  to  the  language  process,  parallelism  is  lost.  In 
the  tactical  air  domain  information  is  updated  quiddy, 
and  so  an  increasing  proportion  of  CPU  cydes  will  be 
necessary  to  keep  the  two  processes  in  sync.  Thus, 
to  maximize  reactivity,  we  conjecture  that  a  separate 
process  for  communication  would  be  more  costly  and 
no  more  effective  than  the  method  outlined  in  the  fol¬ 
lowing  section. 
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task  level  rather  than  at  the  full  task  level.  In 
other  words,  we  can  view  the  all-or-nothing  model 
as  a  degenerate  case  of  Figure  2,  one  in  which  the 
granularity  of  the  interleavable  components  is  as 
large  as  possible.  As  we  have  seen,  the  disadvan¬ 
tage  of  choosing  the  maximal  grain  size  is  that  the 
components  are  too  large  for  the  agent  to  behave 
in  a  timely  fashion. 

3  Achieving  Interleavable  Commu¬ 
nication 

For  NL-Soar  to  provide  a  reactive,  interleavable 
language  capability  for  IFOR  agents,  the  ^stem 
as  a  whole  must  have  three  properties:  linear  com- 
plexity^  inierruptahiliiy^  and  aiomaiidiy.  The  first 
property,  linear  complexity,  means  that  processing 
to  understand  or  generate  a  message  must  take 
time  that  is  roughly  linear  in  the  size  of  the  mes¬ 
sage.  This  is  necessary  to  keep  pace  with  human 
rates  of  language  use.  The  second  property,  inter- 
ruptability,  ensures  that  time-critical  task  behav¬ 
iors  cannot  be  shut  out  by  language  processing 
(and  vice  versa).  The  third  property,  atomaticity, 
ensures  that  if  language  processing  is  interrupted, 
partially  constructed  representations  are  left  in  a 
consistent  and  resumable  state. 

To  understand  how  NL-Soar  provides  the  de¬ 
sired  communication  model,  we  must  first  briefly 
review  the  components  out  of  which  Soar  systems 


are  organized.  Figure  3  is  a  graphical  representa¬ 
tion  of  a  hypothetical  Soar  system  that  uses  NL- 
Soar  for  comprehension  and  generation.  Linguistic 
processes,  like  all  processes  in  Soar,  are  cast  as  se¬ 
quences  of  operators  (small  arrows)  that  transform 
states  (boxes)  until  a  goal  state  is  achieved.  The 
triangles  in  the  picture  represent  problem  spaces 
which  are  collections  of  operators  and  states.^  The 
comprehension  problem  spaces  contain  operators 
that  use  input  horn  the  perceptual  system  to  build 
syntactic  and  semantic  structures  on  the  state;  the 
generation  problem  spaces  contain  operators  that 
use  semantic  structures  to  produce  syntactic  struc¬ 
tures  and  motor  output.  Note  that  the  problem 
space  labelled  Top  is  the  only  space  connected  to 
the  perceptual  and  motor  systems  and  it  is  this 
space  that  is  designated  by  the  Soar  architecture; 
all  other  problem  spaces  are  provided  by  the  sys¬ 
tem  designer. 

The  dotted  lines  in  the  figure  represent  Soar  im¬ 
passes  which  arise  automatically  when  there  is  a 
lack  of  knowledge  available  in  the  current  problem 
space.  When  an  impasse  arises,  processing  contin¬ 
ues  in  a  subspace  until  the  goal  state  in  the  sub¬ 
space  is  reached.  Note  that  impasses  are  a  general 
recursive  structure  (a  subspace  can  impasse  into 
another  subspace)  that  gives  rise  to  a  goal/subgoal 
hierarchy,  or  goal  stack.  The  thick  banded  arrow 

^For  more  details  on  how  Soar  uses  problem  spaces, 
states  and  operators  to  organize  its  processing  see 
[Laird  et  al,,  1987;  Laird  et  aL,  1995]. 


that  overlays  the  impasse  represents  the  resolu¬ 
tion  of  the  impasse,  and  the  new  knowledge  (called 
chunks)  that  results  from  Soar’s  leeirning  mecha¬ 
nism.  Chunks  capture  the  work  done  in  the  sub- 
spau:e,  making  it  available  in  the  superspace  with¬ 
out  impasse  during  future  processing.  This  means 
that  when  a  system  structured  as  in  Figure  3  is 
fully  chunked  all  of  its  behavior  will  be  produced 
by  operators  in  the  Top  space. 

We  now  have  all  the  pieces  to  build  an  inter- 
leavable  language  capabUity.  In  the  following  sec¬ 
tions,  we  address  how  to  achieve  linearity,  inter- 
rupt^ility,  and  atomaticity  using  these  compo¬ 
nents.  For  the  time  being  we  will  consider  com¬ 
munication  only  in  systems  where  the  desired  be¬ 
havior  shown  in  Figure  2  would  occur  completely 
within  the  Top  problem  space  when  fully  chunked. 
We  call  a  system  organized  in  this  way,  a  Top-siaie 
control  model.*^ 

3.1  Achieving  Linear  Complexity 

Communication  in  an  IFOR  must  occur  in  real¬ 
time  to  keep  pace  vrith  the  flow  of  human  events. 
This  is  not  a  statement  about  how  fast  the  sys¬ 
tem  must  run,  per  se.  Rather,  it  is  a  theoreti¬ 
cal  statement  about  how  processing  must  occur 
within  the  system.  Although  there  is  some  vari¬ 
ability  (some  words  do  reliably  take  longer  to  pro¬ 
cess  than  other  words),  in  general,  the  amount 
of  time  taken  by  people  is  linear  in  the  num¬ 
ber  of  words  in  the  utterance.  A  number  of  de¬ 
sign  constraints  follow  from  this  simple  regular¬ 
ity  [Lehman  ei  1996],  e.g.  construction  of  the 
meaning  of  the  sentence  must  proceed  incremen¬ 
tally,  and  different  knowledge  sources  (^tax,  se¬ 
mantics,  pragmatics)  must  be  applied  in  an  inte¬ 
grated  rather  than  pipe-lined  or  multi-pass  fash¬ 
ion.  NL-Soar  provides  these  properties  [Lehman  ei 
al.,  1991a;  Lewis,  1993].  Briefly,  the  system  relies 
on  Soar’s  notion  of  impasse  to  control  the  search 
through  its  linguistic  knowledge  sources,  and  then 
on  Soar’s  learning  mechanism  to  compile  the  dis¬ 
parate  pieces  of  knowledge  into  an  integrated  form 
that  can  be  applied  directly  (i.e.  in  approximately 
constant  time/word)  in  the  future. 

Figure  4  depicts  the  process  graphically  for  one 
type  of  language  operator,  expand^g  the  left  por¬ 
tion  of  Figure  3.  Consider  the  arrival  of  a  new 
word  into  the  Top  state  and  assume  that  the 
system  has  not  encoimtered  the  word  in  a  simi¬ 
lar  context  in  the  past  (i.e.  the  system  has  no 
pre-chunked  knowledge  about  how  to  process  this 
word).  Once  the  word  has  been  attended  to,  the 
learn-comprehension  operator  will  be  selected,  af¬ 
ter  which  an  impasse  will  arise.  Problem  solving 

“^As  we  will  see  in  Section  4,  this  is  not  the  only 
structure  permitted  by  Soar,  but  it  is  a  valid  organi¬ 
zation  and  the  simplest  place  to  begin. 
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Figure  4:  Achieving  linearity  through  learning 


will  continue  in  the  Create-operator  space  which 
will  generate  a  symbol  for  a  new  u-construcior. 
A  u-constructor  is  a  language  operator  that  fits 
the  new  word  into  the  current  syntactic  structure 
for  the  message.  The  u-constructor  is  composed 
piecemeal  in  the  U-construct  space  which  per¬ 
forms  links  and  snips  on  syntactic  trees  based  on 
knowledge  provided  by  Generate  and  Constraint- 
check.  As  the  goal  of  each  subspace  is  achieved, 
each  impasse  is  resolved,  creating  chunks.  Only 
two  kinds  of  chunks  concern  us  here.  The  imple¬ 
mentation  of  the  u-constructor  is  contained  in  the 
chunks  created  when,  the  impasse  between  Create- 
operator  and  U-construct  is  resolved.  This  means 
that  the  syntactic  tree  that  resulted  firom  the  se¬ 
quential  links  and  snips  that  were  done  in  the 
lower  spaces  will  now  be  produced  immediately 
whenever  this  u-constructor  executes.  The  u- 
constructor  itself  is  returned  from  Create-operator 
to  the  Top  space,  resulting  in  a  chunk  that  tells 
when  this  u-constructor  can  apply  in  the  future. 
Note  that  the  next  time  this  word  is  seen  in  a 
similar  contect,  this  chunk  will  propose  the  new 
u-constructor  directly  in  the  Top  state.  In  other 
words,  once  we  have  learned  the  iop-level  opera-- 
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ior,  no  impasse  will  occur.  Instead,  the  (possibly 
lengthy)  problem  solving  that  took  place  in  the 
subspaces  has  been  compiled  into  a  single  Top- 
space  operator  that  executes  directly  to  build  the 
relevant  syntactic  structure  on  the  Top  state. 

Figure  4  shows  how  the  application  of  gen¬ 
eral  knowledge  about  syntax  is  contextualized  and 
made  efficient.  A  similar  story  can  be  told  about  s- 
constructors,  the  Top-space  operators  that  fit  the 
new  word  into  the  semantic  and  discourse  struc¬ 
tures  maintained  on  the  Top  state.  Thus,  once 
behavior  is  fully  chunked,  the  arrival  of  a  message 
results  in  only  a  small  number  of  Top  operators  per 
word,  the  linear  complexily  we  were  after.  EJqu^ly 
important,  the  language  process  itself  is  now  rep¬ 
resented  in  the  Top  space  in  terms  of  more  finely- 
grained  operators  (u-  and  s-constructors)  that  cre¬ 
ate  the  opportunity  for  interleavability.  On  the 
generation  side,  of  course,  there  is  a  difierent  task 
decomposition  producing  a  different  set  of  To|>- 
space  operators,  but  the  principle  is  the  same. 

3.2  Achieving  Interruptabilily 

In  Soar,  agent  behavior  is  produced  by  the  appli¬ 
cation  of  operators  to  a  state.  Moreover,  the  ar¬ 
chitecture  defines  the  application  of  an  operator  as 
a  non-interruptable  unit  of  work.  In  other  words, 
once  an  operator  has  been  selected  for  application, 
all  the  state  changes  associated  with  that  operator 
are  guaranteed  to  be  made  before  any  other  opera¬ 
tor  is  selected.  What  does  this  mean  for  NL-Soar? 
In  short,  it  means  that  the  Top-level  language  op¬ 
erators  ffictate  the  granularity  of  the  interleavable 
components.  To  anchor  the  point  in  the  context  of 
Figure  4,  once  a  u-constructor  exists,  we  cannot  in¬ 
terleave  changes  to  the  syntax  tree  with  other  non- 
linguistic  tasks.  Put  more  strongly,  once  the  u- 
constructor  is  selected,  all  other  subtasks  are  shut 
out  for  the  duration  of  its  application.  In  addition, 
if  the  Top  state  changes  during  the  application  of 
the  u-constructor  (via  perception),  those  changes 
are  effectively  invisible  until  the  u-constructor’s 
state  changes  have  been  made.^ 

How  is  this  situation  different  from  the  one  in 
Figure  1,  where  lack  of  interruptibility  meant  re¬ 
activity  was  diminished  to  the  point  of  inviting 
wasted  work,  if  not  disaster?  The  difference  here 
is  that  the  granularity  of  NLf-Soar’s  operators  is 
small  enough  to  allow  interruptibility  below  the 
full  task  level.  The  current  scheme  separates  the 
work  of  attention  from  work  done  to  the  syntac- 

®This  is  an  overstatement.  In  fact,  it  k  possible  to 
encode  knowledge  in  Soar  in  such  a  way  that  it  is  tied 
only  to  the  state,  not  to  any  particular  operator.  Such 
knowledge  will  lead  to  state  changes  regardless  of  what 
operator  is  being  applied.  Since  most  task  knowledge 
is  tied  to  task  operators,  however,  the  discussion  above 
is  still  a  useful  way  to  think  about  what’s  going  on. 


tic  tree  (u-constructors)  from  work  done  to  the 
semantic  and  discourse  models  (s-constructors). 
Thus,  the  current  comprehension  capability  al¬ 
lows  for  interruption  between  each  set  of  state 
changes.  Note,  however,  that  we  could  have  made 
this  choice  differently.  We  could,  for  example, 
build  both  syntactic  and  semantic  structures  in 
the  impasse  under  the  leam-comprehension  op¬ 
erator.  The  resulting  Top-space  comprehension 
operator  would  effectively  bundle  all  of  compre¬ 
hension  into  a  single  operator.^  Alternatively,  we 
could  make  link  and  snip  the  Top  operators,  giv¬ 
ing  an  even  finer  grain.  Although  it  is  clear  that 
the  architecture  permits  a  wide  range  of  choices, 
choosing  the  right  granularity  is  not  a  wholey  un¬ 
principled  exercise.  In  general,  the  more  work  en¬ 
compassed  by  a  Top  operator,  the  more  specific 
will  be  the  conditions  under  which  it  can  apply. 
The  more  specific  the  conditions  the  less  transfer 
of  the  knowledge  to  new  situations  and  the  more 
learning  events  will  be  required  to  get  fully  chun¬ 
ked  language  behavior.  On  the  other  side,  the  less 
work  encompassed  by  a  Top  operator,  the  more 
operators  per  word  there  will  be,  until,  eventually, 
the  number  will  reflect  some  non-linear  quantity 
(e.g.  the  size  of  the  parse  tree).  In  Section  4,  be¬ 
low,  we  demonstrate  how  the  operator  granularity 
we  have  chosen  allows  both  transfer  and  interleav¬ 
ing  while  maintaining  linearity. 

Now  that  we  have  language  operators  of  a  size 
that  allows  interruptibility,  the  next  question  that 
needs  to  be  addressed  is:  how  do  you  decide  which 
type  of  operator,  linguistic  or  non-linguistic,  to 
select  next?  Many  control  schemes  are  possible, 
ranging  from  random  selection  to  a  complete  par¬ 
tial  ordering  over  all  the  operators  in  the  system, 
to  always  attending  to  communication  iimt  (or 
last).  In  integrating  NL-Soar  with  TacAir-Soar  we 
will  use  random  selection  for  its  simplicity.  AVhat 
is  important  to  remember,  however,  is  that  under 
Top-state  control  the  selection  decision  is  made  on 
an  operator  by  operator  basis,  not  task  by  task. 

3.3  Achieving  Atomaticity 

Recall  that  atomaticity  ensures  that  if  language 
processing  is  interrupted,  partially  constructed 
representations  are  left  in  a  consistent  and  re- 
sumable  state.  Given  our  discussion  above, 
it  would  seem  that  the  architecturally  enforced 
non-interruptability  of  operators  would  guarantee 
atomaticity  as  well.  This  is  certainly  true  if  all  of 
the  language  behavior  is  impasse-free.  Suppose, 
however,  that  the  system  is  in  the  middle  of  learn¬ 
ing  a  new  u-constructor  or  s-constructor,  as  in  Fig¬ 
ure  4,  when  state  changes  create  a  preference  for 
a  non-linguistic  Top-space  operator.  In  this  case, 

®Aii  early  version  of  NL-Soar  did,  in  fact,  use  this 
scheme  p^hman  et  oL,  1991b]. 
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Figure  5:  The  lead  TacAir  agent  composes  a  message  while  tracking  a  threat  and  flying 


once  the  operator  currently  being  applied  in  the 
lowest  subspace  is  finished,  the  task  operator  will 
be  selected  in  the  Top  space  and  the  language  goal 
stack  will  collapse.  Can  we  be  sure  that  we  have 
been  left  in  a  consistent  state  so  that  language 
processing  can  be  smoothly  resumed? 

The  answer  is  yes  because  the  design  of  NL-soar 
ensures  that  no  changes  are  actually  made  to  the 
language  data  structures  on  the  Top  state  until  the 
u-constructor  is  returned.  Look  again  at  Figure 
4.  The  only  operator  that  can  result  in  changes  to 
the  Top  state  is  Create-operator’s  return-operator. 
But  if  it  is  being  applied  when  a  preference  is  cre¬ 
ated  for  a  Top-space  task  operator,  then  we  know 
it  will  complete,  the  results  will  be  returned,  and 
the  u-constructor  proposal  chunk  will  be  built.  If 
the  subspace  operator  is  not  the  return-operator, 
no  results  will  be  returned  from  the  top-most  im¬ 
passe  and  no  proposal  chunk  will  be  built  for  the 
u-constructor.  Observe,  however,  that  the  con¬ 
ditions  that  led  to  the  leam-comprehension  oper¬ 
ator  in  the  Top  space  may  well  still  obtain.  So 
once  the  task  operator  has  been  applied,  language 
may  be  resumed.  Since  no  u-constructor  was  built, 
the  system  will  have  to  rebuild  the  goal  stack 
to  continue.  In  practice,  the  situation  is  not  as 
bad  as  it  sotmds  because  chunks  may  have  been 
built  in  the  subspaces  during  the  previous  learn- 
comprehension  processing  that  were  not  specific 
to  the  particular  u-constructor.  These  chunks  will 
transfer  to  the  current  situation  and  the  impasses 
that  created  them  will  be  avoided. 

4  Bringing  it  all  together  in  TacAir- 
Soar 

In  Section  2  we  argued  that  a  communication  ca¬ 
pability  for  IFORs  had  to  have  three  properties: 
linear  complexity,  interruptability,  and  atomatic- 
ity.  In  the  previous  section  we  introduced  the 
Top-state  control  model  in  which  whole  tasks  are 
interleaved  on  an  operator-by-operator  basis  and 
communication  is  just  another  task.  One  of  the 


interesting  characteristics  of  systems  organized  as 
in  Figure  3  is  that  the  goal  stack  is  never  shared 
across  linguistic  and  non-linguistic  tasks;  the  need 
to  understand  or  produce  a  message  pulls  the  sys¬ 
tem  out  of  a  task  goal  stack.  As  a  result,  Top-state 
task  operators,  like  the  Top-state  language  oper¬ 
ators,  tend  to  represent  subtasks  of  fairly  short 
duration. 

In  contrast,  systems  like  TacAir-Soar  are  com¬ 
posed  of  a  Top  task  operator  of  very  long  durar- 
tion,  and  a  goal  stack  that  reflects  many  levels 
of  abstraction  of  that  task.  Each  level  stays  ac¬ 
tive  as  long  as  it  is  being  carried  out.  In  partic¬ 
ular,  TacAir  uses  Soar’s  Top  state  to  keep  track 
of  the  “execute-mission”  task,  which  stays  active 
for  the  entire  simulation.  Under  this  will  be  a 
stack  of  sub-tasks,  such  as  “mig-sweep”,  “inter¬ 
cept”,  “employ-weapons”,  and  so  on,  each  repre¬ 
senting  a  more  detailed  view  of  what  the  agent  is 
currently  trying  to  do.  Much  of  ThcAir’s  ^owl- 
edge  of  its  current  situation  and  goals  is  stored  in 
sub-states  associated  with  these  subtasks,  not  on 
the  Top  state.^  Thus,  if  TacAir  switched  to  lan¬ 
guage  in  its  Top  state,  it  would  lose  imich  of  this 
knowledge.  Clearly,  TacAir-Soar  is  incompatible 
with  the  Top-state  control  model  outlined  above. 
To  understand  how  to  modify  Top-state  control 
without  sacrificing  linearity,  interruptibility  and 
atomaticity,  we  must  answer  the  question:  what 
role,  exactly,  does  the  Top  state  play  in  maintain¬ 
ing  each  property? 

For  linear  complexity,  the  role  played  by  the 
Top  state  is  simply  a  place  to  apply  the  so-called 
Top-state  operators.  In  reality,  what  is  critical 
for  linear  complexity  is  that  there  is  an  effective 
procedure  for  building  the  top-level  language  op¬ 
erators,  and  that  only  a  small  number  of  them  are 
necessary  for  each  word  in  the  message.  For  in¬ 
terruptability  and  atomaticity,  the  Top  state  does 
play  a  more  central  role.  Specifically,  it  must  be 
the  place  where  Top-level  language  operators  leave 


^A  fuHer  description  of  TacAir-Soax  can  be  found 
in  [Laird  et  al.,  1995]. 
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Figure  6:  Figure  5  continued:  Pilot  continues  to  talk  as  wing  begins  to  listen 


their  results  because  it  is  the  only  state  that  is 
guaranteed  to  still  be  in  the  goal  stack  when  lan¬ 
guage  processing  resumes.  Thus,  where  top-level 
language  operators  are  applied  is  immaterial  as 
long  as  they  leave  their  results  on  the  Top  state 
where  they  can  be  found  whenever,  and  wherever, 
language  processing  resumes. 

Separating  the  question  of  where  top-level  lan¬ 
guage  operators  are  applied  &om  the  question  of 
where  they  leave  their  results  allows  us  to  define  a 
variety  of  virtual  Top-siate  control  schemes.  The 
simplest  one,  and  the  one  we  use  when  integrating 
NL-Soar  with  ThcAir  agents,  is  to  interleave  lan¬ 
guage  operators  with  whatever  task  operators  are 
available  in  the  lowest  problem  space  in  the  goal 
stack.  Because  the  goal  stack  grows  and  shrinks 
over  time,  the  interleaving  of  communication  will 
take  place  more  or  less  throughout  the  range  of 
non-linguistic  subtasks.  The  simpliciiy  of  the  in¬ 
tegration  is  extended  by  allowing  the  architecture 
to  decide  randomly  between  language  and  non¬ 
language  operators  whenever  both  types  are  ap¬ 
plicable  in  the  current  situation. 

Figures  5  through  7  capture  a  portion  of  the 
behavior  of  two  TacAir-Soar  agents  running  with 
a  fully-chunked  NL-Soar  under  virtual  Top-state 
control.®  In  the  scenario  depicted,  two  pilots  fly 

^Requiring  NL-Soax  to  ^eam  while  doing*  would 
be  equivalent  to  expecting  the  pilot  to  learn  the  do- 
mam  language  while  flying  the  plane  in  battle.  Con¬ 
sequently,  we  use  off-line  training  to  allow  NL-Soar  to 
learn  hom  experience  in  a  non-real-time  setting.  This 
gives  the  system  the  time  it  needs  to  integrate  its  dis- 


F14s  as  a  section  with  a  single  red  plane  flying 
against  them.  ParrotlOl  is  the  lead  and  Parrotl02 
is  the  wing.  The  timelines  in  the  figures  show 
the  operators  that  each  agent  executed  in  a  par¬ 
ticular  engagement,  together  with  those  events  in 
the  external  world  that  affect  or  depend  on  their 
behavior.  Language  operators  are  indicated  via 
bold-face.  For  simplicity,  the  representation  does 
not  try  to  preserve  the  goal-subgoal  relationship 
of  the  task  operators. 

In  the  time  prior  to  the  first  event  shown  in 
Figure  5,  the  two  planes  have  begun  to  fly  in  a 
racetrack  configuration.  The  portion  of  behavior 
we  are  interested  in  begins  when  the  lead  notices 
the  bogey  (1),  and  must  communicate  the  relevant 
information  to  its  wing.  The  report-contact  oper¬ 
ator  (2)  posts  a  communicative  goal  on  the  Top 
state  indicating  that  the  agent  wants  to  say  some¬ 
thing.  Interleaving  begins  (somewhat  unevenly 
due  to  the  random  control  scheme)  at  (3).  First, 
three  task  operators  are  executed  in  which  the 
agent  determines  that  the  bogey  is  in  fact  a  ban¬ 
dit,  decides  to  check  whether  the  commit  criteria 
have  been  satisfled  (they  have  not),  and  notices 
that  the  bandit  is  within  missile  range.  Then,  at 
(4),  language  operators  begin  to  compose  the  mes¬ 
sage  according  to  communication  doctrine.  The 
first  step  in  any  lead-wing  communication  is  the 

parate  knowledge  sources  into  the  top-level  operators 
discussed  m  Section  3.1.  It  is  this  highly  compiled 
form  of  language  knowledge  that  models  an  experi¬ 
enced  pilot  and  provides  real-time  language  behavior 
on-line. 
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Figure  7:  Figure  6  continued:  completion  of  summons  generation 


exchange  of  callsigns,  here,  the  sentence  Parroil02 
this  is  ParroilOl.  This  is  a  domain-dependent  in- 
stance  of  the  more  general  class  of  utterances  we 
call  summons  (for  example  the  telephone  exchange 
John?  IVs  JUL)  The  summons  is  constructed 
piece  by  piece  using  top-level  generation  operators 
(in  boldface).  Figure  5  shows  this  linguistic  pro¬ 
cess  interleaved  with  operators  that  contribute  to 
situation  awareness  (5)  and  operators  that  fly  the 
plane  (6),  (7),  and  (8). 

Figure  6  continues  the  timeline  for  ParrotlOl 
and  introduces  Parrotl02  at  the  point  just  be¬ 
fore  the  first  word  of  the  summons  arrives  into  the 
agent’s  input  buffer.  The  timelines  are  aligned  by 
the  linguistic  output  of  ParrotlOl  and  the  linguis¬ 
tic  input  of  Parrotl02. 

To  this  point  in  the  scenario,  the  wing  has  sim¬ 
ply  been  flying  a  racetrack  with  the  lead.  At  (9) 
ParrotlOl  outputs  the  wing’s  callsign  in  the  upper 
timeline.  Note  that  thb  is  done  even  though  the 
construction  of  the  remainder  of  the  summons  is 
still  being  interleaved  with  non-linguistic  subtasks 
(10)  through  (12);  both  generation  and  compre¬ 
hension  are  incremental.  Meanwhile,  shortly  af¬ 
ter  Parrotl02  has  begun  to  turn  (13),  the  call- 
sign  is  heard  (14).  The  lower  timeline  continues 
with  comprehension  of  the  first  few  words  of  the 
summons  ((16)  and  (18))  interleaved  with  oper¬ 
ators  that  keep  the  wing  in  formation  ((15)  and 
(17)).  Note  that  the  s-  and  u-constructors  for  the 
word  {his  (18)  fire  after  the  word  is  has  already 
been  heard.  This  is  partly  because  the  lead’s  mes¬ 
sage  is  coming  out  quickly,  and  partly  because  the 
wing’s  attention  has  been  focused  on  flying  the 
plane.  The  input  buffer  that  holds  unattended 
speech  has  a  decay  rate;  as  in  people,  if  speech 


goes  unattended  long  enough  (as  it  may  if  the  pi¬ 
lot  is  in  a  stressful  situation),  it  simply  disappears 
from  the  buffer. 

Figure  7  continues  the  interchange  to  the  point 
that  ParrotlOl  outputs  the  final  word  of  the  sum¬ 
mons  (19).  There  is  no  interleaving  in  this  por¬ 
tion  of  the  trace  because  both  pilots  are  simply 
flying  the  long  leg  of  the  racetrack  where  no  task 
operators  are  proposed.  Notice  that  by  the  time 
the  lead  has  begun  the  second  portion  of  the  sum¬ 
mons,  the  wing  has  caught  up  on  the  comprehen¬ 
sion  side  (19).  The  rapidity  with  which  I  have 
a  contact  emerges,  however,  once  again  results  in 
buffered  input  for  Parrotl02  (21).  Thus,  linguis¬ 
tic  processing  continues  in  the  wing  agent  after 
the  lead  has  already  begun  to  wait  for  a  reply  (not 
shown).  As  a  final  observation,  note  that  the  same 
u-constructor  that  processes  ParrotlOl  in  Figure  6 
also  processes  /in  Figure  7  (u-constructor2).  This 
is  an  example  of  where  the  granularity  of  the  top- 
level  operators  affords  some  transfer  of  syntactic 
processing  despite  the  difference  in  semantics  (s- 
constructorfi  vs.  s-construct26). 


5  Conclusions 

The  ability  to  communicate  in  natural  language 
can  be  a  key  capability  contributing  to  an  IFOR’s 
performance  in  both  simulation  and  training  exer¬ 
cises.  In  this  paper  we  have  discussed  how  the  de¬ 
sign  of  NL-Soar  uses  linear  complexity,  interrupt- 
ibility,  and  atomaticity  of  language  processing  to 
provide  a  language  capability  that  does  not  com¬ 
promise  reactivity.  What  we  have  not  discussed, 
however,  is  the  third  area  of  interest  identified  in 


[RubinofF and  Lehman,  1994a]:  performance  in  ac¬ 
cordance  with  empirical  data  from  pilots  in  real- 
life  simulations.  Our  continued  work,  therefore, 
will  focus  on  making  the  NL-Soar  integration  more 
robust,  including  handling  linguistic  constructions 
specific  to  the  domain  and  allowing  for  the  inter¬ 
ruptions  and  self-corrections  that  necessarily  come 
with  real  language  use. 
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The  clever  combatant  looks  to  the  effect 
of  combined  energy,  and  does  not  require 
too  much  from  individuals. 

Sun  Tzu  The  Art  of  War 

1.  Abstract 

The  effectiveness  of  intelligent  computer  gener¬ 
ated  forces  is  limited  by  their  ability  to  closely 
coordinate  their  actions  within  the  overall  battle¬ 
field  situation.  We  have  developed  intelligent  com¬ 
mand  and  control  agents  which  monitor  large  sec¬ 
tions  of  the  battlefield  and  deploy  other  forces  for 
increased  effectiveness.  These  agents  have  been 
demonstrated  in  the  air  to  air,  close  air  support, 
and  air  strike  domains. 

2.  Introduction 

Our  goal  is  the  development  of  intelligent  forces 
(IFOR’s),  computer  agents  which  are  functionally 
indistinguishable  from  human  agents  in  their  abil¬ 
ity  to  interact  with  the  synthetic  environment. 
The  Soar/IFOR  consortium,  involving  the  Uni¬ 
versity  of  Michigan,  Information  Sciences  Insti¬ 
tute  of  the  University  of  Southern  California,  and 
Carnegie  Melon  University,  is  developing  IFORs 
for  all  military  air  missions;  air  to  air,  air  to 
ground,  air  supply,  anti-armor  attack,  etc.  IFORs 
must  have  many  capabilities  to  be  successful'  in¬ 
cluding:  extensive  knowledge,  real-time  reactiv- 
ity,  goal-directed  problem  solving,  and  planning. 
Additionally,  they  must  coordinate  their  activities 
with  other  friendly  forces  (Laird  et  ai,  1995a). 

To  fully  support  very  large  scale  battle  field  sim¬ 
ulations,  such  as  those  envisioned  for  STOW-97, 
intelligent  computer  generated  forces  cannot  act 
independently;  but  rather,  they  must  coordinate 
their  efforts  for  increased  effect  just  as  humans 


do.  This  requires  a  means  and  a  method  for  co¬ 
ordination,  the  ability  to  convey  coordination  in¬ 
formation,  and  the  ability  for  large  scale  situation 
assessment.  In  military  parlance  this  is  commonly 
referred  to  as  command,  control,  communications, 
and  intelligence  (C^I ). 

This  paper  discusses  our  current  state  of  develop¬ 
ment  of  intelligent,  realistic  C^I  agents  for  simu¬ 
lation  in  the  air  domain.  These  agents  have  been 
implemented  using  ModSAF  (Calder  ei  al,  1993) 
and  the  Soar/ModSAF  interface  (Schwamb  ei  al.^ 
1994). 

The  remainder  of  this  introductory  section  pro¬ 
vides  an  overview  of  the  C^I  domain  and  some 
motivation  for  this  work.  Section  3  has  a  descrip¬ 
tion  of  the  C^I  agents  implemented  by  this  project 
to  date.  Section  4  discusses  the  general  responsi¬ 
bilities  of  each  agent  and  goes  on  to  show  how  our 
agents  demonstrate  each  of  the  C^I  functions.  Sec¬ 
tion  5  provides  an  extended  example  of  the  inter¬ 
action  between  multiple  C^I  agents  and-  a  section 
of  planes  flying  close  air  support.  Section  6  dis¬ 
cusses  research  and  open  problems.  Finally,  sec¬ 
tion  7  provides  general  discussion  and  conclusions. 

2.1.  Domain  Overview 

Previous  work  in  computer  generated  forces  has 
either  focused  on  individual  agents  working  in  rel¬ 
ative  isolation  or  groups  of  agents  which  may  be 
treated  as  a  whole  (Rao  ei  a/.,  1994).  A  notable 
exception  is  (Balias  ei  a/.,  in  press).  These  ap¬ 
proaches  avoid  the  problems  of  C^I  by  allowing 
human  guidance,  but  when  the  agents  number  in 
the  tens  of  thousands,  finding  enough  people  to 
control  them  is  infeasible. 

In  1994,  the  Soar/IFOR  project  was  tasked  to  pro¬ 
vide  automated  pilots  for  all  air  vehicles  and  mis- 


sions  in  support  of  STOW-97.  (See  (Laird  ei  aL, 
1995b)  for  an  overview  of  the  current  state  of  this 
project.)  In  order  to  accomplish  this  task  we 
needed  to  extend  the  scope  of  the  project  to  in¬ 
clude  those  interactions  necessary  between  pilots 
and  controllers,  even  if  they  are  not  airborne.  For 
example,  orange  agents  are  at  a  severe  disadvan¬ 
tage  if  they  cannot  rely  on  ground  based  radar 
control  (GCI)  to  track  threats  outside  the  limited 
scope  of  their  own  radar. 

The  most  C^I  intensive  missions  we  have  imple¬ 
mented  to  date  are  air  to  air  combat  and  close 
air  support  (CAS).  In  the  air  to  air  domain,  the 
controller  may  be  responsible  for  maintaining  a  de¬ 
fensive  perimeter  around  the  carrier  battle  group, 
locating  potential  threats,  confirming  that  an  un¬ 
known  aircraft  is  a  threat,  providing  timely  up¬ 
dates  until  friendly  planes  have  radar  contact, 
then  issuing  additional  information  in  response  to 
queries. 

While  air  to  air  combat  has  a  single  (or  small 
number  of)  controllers,  in  contrast,  the  close 
air  support  domain  demonstrates  a  wide  variety 
of  controllers.  In  the  CAS  domain,  the  attack 
planes  must  have  detailed  integration  with  multi¬ 
ple  agents  because  of  close  proximity  between  tar¬ 
gets  and  friendly  forces.  These  controllers  commu¬ 
nicate  with  the  planes  (locating  targets  and  decon¬ 
flicting)  as  well  as  amongst  themselves  (requesting 
missions  and  allocating  forces.) 

2.2.  Motivation 

The  primary  motivation  for  doing  this  work  is  to 
develop  realistic  C^I  agents.  The  IFOR  C^I  agents 
should  be  indistinguishable  from  human  agents 
performing  similar  functions.  This  involves  be¬ 
lievable  interacts  with  the  simulator  as  well  as 
interactions  with  other  agents  and  humans  at  a 
natural  level.  By  basing  IFOR  agents  on  Soar,  a 
theory  of  cognition  (Laird  ic  Rosenbloom,  1994; 
Laird  ei  ai,  1987;  Newell,  1987),  and  modeling 
not  only  the  externally  observable  behavior,  but 
plausible  thought  processes  which  are  necessary  to 
produce  realistic  behavior,  we  intend  to  overcome 
both  dumb,  canned  responses  and  implausible,  su¬ 
perhuman  responses. 

The  second  motivation  for  doing  this  work  is  ef¬ 
fectiveness.  Without  C^I  agents  our  automated 
pilots  have  only  limited  ability  to  sense  and  in¬ 
teract  with  their  environment.  Enemy  agents  can 


sneak  up  behind  them  or  fly  around  them.  In  addi¬ 
tion  the  automated  pilots  have  only  limited  ability 
to  change  their  mission.  Without  the  large  scale 
perspective  provided  by  the  controller,  they  don’t 
even  realize  that  there  might  be  a  need  to  change 
their  mission. 

Adding  C^I  can  increase  the  level  and  types  of 
applications  for  military  simulation.  As  battle¬ 
field  simulators  become  more  realistic,  we  want 
to  make  them  available  for  more  advanced  pur¬ 
poses.  The  major  use  of  air  simulators  to  date  is 
in  pilot  training.  By  providing  intermediate  level 
controllers,  we  expect  to  make  simulation  usable 
not  only  in  pilot  training,  but  also  in  training  hu¬ 
man  controllers  to  interact  with  and  control  these 
controllers. 

Finally,  we  wish  to  study  human  cognition  and 
the  ability  to  model  it  in  Soar.  C^I  provides  a 
new  domain  for  this  research  which  suggests  more 
knowledge  and  exhibits  different  types  of  knowl¬ 
edge  than  that  used  by  aircraft  pilots. 

3.  C^I  Agents 

In  order  to  increase  realism  and  promote  pla^- 
bility  at  various  levels,  we  base  C^I  on  existing 
techniques  currently  in  use  by  military  organiza¬ 
tions  and  embody  them  in  specialized  agents  cor¬ 
responding  to  military  controllers.  Thus  there  is 
a  direct  one  to  one  mapping  between  our  agents 
and  humans. 

Currently,  we  have  operational  versions  of  the  fol¬ 
lowing  C^I  agents: 

•  Air  Intercept  Controller  (AIC)  which  assigns 
planes  to  stations,  spots  threats,  and  provides 
information  about  enemy  planes.  The  AIC 
is  airborne,  situated  in  a  plane  with  a  large 
radar,  such  as  an  E-2C. 

•  Ground  Controlled  Intercept  (GCI)  performs 
the  same  sort  of  mission  as  an  AIC  but  is 
ground  based  and  immovable. 

•  Forward  Air  Controller  (FAC)  which  locates 
targets  and  provides  final  directions  for  close 
air  support.  Forward  air  controllers  may  be 
either  ground  based  or  airborne  (FAC(A)). 

•  Direct  Air  Support  Center  (DASC)  which  as¬ 
signs  aircraft  to  missions,  potentially  alters 
the  missions,  and  hands  off  attack  missions 


to  the  FAC.  The  DASC  is  ship  based,  usually 
on  the  aircraft  carrier. 

•  Tactical  Air  Direction  Controller  (TAD)  di¬ 
rects  air  operations  within  the  Amphibious 
Operations  Area  (AOA)  prior  to  the  estab¬ 
lishment  of  a  DASC.  The  TAD  is  also  ship 
based  and  may  be  colocated  with  the  DASC. 

•  Fire  Support  Coordination  Center  (FSCC) 
determines  the  type  of  support  to  utilize 
(CAS,  artillery,  naval  gunfire).  If  CAS 
is  determined  it  generates  a  Joint  Tactical 
Airstrike  Request  and  coordinates  CAS  re¬ 
quests  with  the  DASC.  The  FSCC  is  ground 
based  within  the  AOA. 

•  Tactical  Air  Command  Center  (TACC)  which 
provides  air  traffic  control,  routing,  and  de- 
confliction  within  the  AOA.  The  TACC  is 
ground  based  and  usually  colocated  with  the 
FSCC. 

In  the  following  section  we  explore  how  agents 
demonstrate  the  capabilities  necessary  for  coordi¬ 
nating  the  behaviors  of  multiple  agents. 

4.  Responsibilities 

In  addition  to  the  specific  responsibilities  of  each 
agent  given  above  there  are  several  general  re¬ 
sponsibilities  associated  with  C^I  agents.  These 
responsibilities  are  broken  out  into  separate  top¬ 
ics,  but  it  must  be  realized  that  to  work  effectively 
all  of  these  activities  must  be  going  on  simultane¬ 
ously. 

4.1.  Command 

C^I  agents  are  responsible  for  mission  initiation 
as  well  as  tracking  and  modifying  the  mission  as 
it  develops.  Typically  the  planes  will  have  a  pre¬ 
briefed  mission,  but  often  this  mission  will  need 
to  be  chcinged  or  replaced  entirely  as  the  battle¬ 
field  situation  developed.  Our  command  agents 
can  change  almost  every  aspect  of  a  mission  in¬ 
cluding  assignment  of  individual  CAP^  stations, 
routes,  target  times,  and  the  final  targets. 

In  order  to  effectively  carry  out  their  command 
function,  C^I  agents  need  to  have  a  command 
organization.  WeVe  observed  two  different  com¬ 
mand  organizations  for  C^I  agents. 

^  Combat  Air  Patrol 


In  the  air  to  air  domain  command  is  centralized. 
Either  the  AIC  or  the  GCI  are  responsible  for  all 
air  traffic.  These  agents  provide  continuous  con¬ 
trol  and  information  for  many  sections  of  planes. 
Though  there  may  be  multiple  controllers  acting 
at  the  same  time  they  have  clearly  separated  du¬ 
ties,  and  there  is  very  little  interaction. 

In  contrast,  in  the  CAS  domain  command  is  de¬ 
centralized.  As  the  planes  fly  through  different 
regions  they  are  directed  by  multiple  controllers, 
all  of  which  are  responsible  for  the  ultimate  suc¬ 
cess  of  the  mission.  Though  there  is  still  a  chain 
of  command,  because  of  limited  numbers  of  radios 
and  limited  broadcast  range  the  planes  may  not  be 
in  continuous  contact  with  any  single  controller. 

The  controllers  in  CAS  need  to  coordinate  not 
only  the  planes,  but  also  themselves.  The  TACC, 
DASC,  FSCC,  and  FAC  have  to  form  a  distributed 
control  network  in  which  mission  requests  and  as¬ 
signments  are  propagated  through  the  network. 

4.2.  Control 

The  mission  of  a  controller  is  to  continually  assess 
the  situation  then  allocate,  or  re-allocate,  forces 
for  maximum  effect. 

The  combined  knowledge  of  overall  mission  ob¬ 
jectives  and  threat  detection  makes  controllers 
uniquely  capable  of  resource  allocation.  They 
need  to  assess  the  resources  available  and  when 
future  resources  might  become  available,  balanced 
against  current  and  potential  threats.  They  must 
synchronize  their  own  forces,  and  their  efforts  with 
respect  to  other  controllers.  Higher  level  con¬ 
trollers  have  to  trade  off  the  utility  of  multiple 
potential  assignments  for  maximal  effectiveness, 
while  low  level  controllers  can  only  shout  louder 
hoping  to  increase  the  priority  of  their  request  for 
resource  allocation. 

Poorly  coordinated  attacks  can  be  weak  and  in¬ 
effective.  One  way  C^I  agents  coordinate  is  by 
synchronizing  attacks  through  timing  constraints. 
For  example,  in  the  CAS  domain,  when  bombing 
in  tight  proximity  to  friendly  troops,  timings  must 
be  accurate  to  plus  or  minus  ten  seconds  to  avoid 
interference  with  friendly  troops. 

To  accomplish  this  C^I  agents  must  be  capable  of 
real-time  reactive  planning.  Both  threats,  friendly 
forces,  and  messages  from  other  controllers  may 


arrive  at  any  time.  The  overall  battle  plan  must  be 
incrementally  supplemented  with  new  information 
so  that  we  seize  opportunities  and  knowingly  avoid 
or  confront  risks. 

Soar  provides  several  capabilities  which  help  man¬ 
age  these  real-time  asynchronous  inputs.  First, 
the  decision  of  what  to  do  next  is  handled  through 
production  rules.  During  each  decision  cycle  all 
relevant  rules  are  tested  and  allowed  to  fire  in  par¬ 
allel.  Thus  the  sequence  of  execution  is  not  fixed. 

The  real-time  is  requirement  handled  by  making 
the  speed  of  operator  execution  comparable  to 
experimental  results  in  humans  (Newell,  1990). 
Since  this  can  only  guarantee  soft  real-time,  our 
agents  will  react  quickly,  but  may  fail  to  react 
quickly  enough  when  faced  with  overly  complex 
situations,  just  as  people  do.  Limiting  the  number 
of  available  choices  increases  the  speed  of  decision 
making.  Soar  uses  operator  subgoaling  to  provide 
a  context  for  focusing  decisions  on  information  rel¬ 
evant  to  the  current  situation.  For  example,  when 
under  attack  and  bugging  out  an  E-2  might  not 
be  overly  concerned  with  planning  the  course  to 
its  CAP  station. 

Another  way  to  increase  military  effectiveness  is 
to  decrease  the  interference  from  one’s  own  forces. 
In  actual  combat  (as  opposed  to  simulation)  this 
will  have  serious  morale  consequences.  The  de- 
confliction  duties  weVe  implemented  range  from 
air  traffic  control  to  route  planning  to  explicitly 
informing  the  plane  of  the  location  friendly  forces. 

4.3.  Communication 

The  nature  of  communication  is  that  commands 
must  be  brief,  and  commands  must  be  clear.  C^I 
agents  must  communicate  relevant  information  in 
a  timely  and  effective  manner.  Communication 
can  range  from  simple  (e.g.,  “proceed  as  briefed” 
or  “negative”)  to  very  complex,  such  as  a  nine  line 
brief  shown  in  figure  1. 

The  domain  of  military  communication  is  well  re¬ 
searched,  and  the  military  jargon  provides  a  form 
of  communication  which  is  brief  yet  maximizes 
the  communication  of  necessary  knowledge  with¬ 
out  undue  overhead.  We  attempt  to  model  C^I  us¬ 
ing  standardized  forms,  realistic  dialog  from  actual 
communications  of  former  pilots,  and  examples 
from  training  manuals  whenever  possible.  We  be¬ 
lieve  that  by  making  communication  explicit  and 
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Figure  1:  Nine/twelve  line  brief 


based  on  human  communication  we  can  offer  an 
approach  to  better  human  interaction  and  easier 
evaluation  of  the  results  of  a  simulation. 

The  approaxi  used  by  the  military,  and  the  ap¬ 
proach  we’ve  adopted,  is  to  use  a  shared  format 
for  all  communication.  Complex  commands  use 
a  standard  template  to  reduce  transmission  time 
and  ensure  all  relevant  information  has  been  com¬ 
municated. 

To  compensate  for  lost  messages  and  electronic  in¬ 
terference  we  repeat  messages  until  confirmation 
is  forthcoming  .  The  receipt  of  commands  must 
be  confirmed  through  “roger,”  or  if  some  action 
is  necessary,  by  the  recipient  either  “wilco”  (will 
comply)  or  “negative”  (will  not  comply). 

While  we  have  yet  to  incorporate  a  general  nat¬ 
ural  language  understanding  system  with  TacAir- 
Soar,  the  commands  used  are  based  on  the  actual 
English  communications  used  between  controllers 
and  pilots  in  similar  situations.  This  makes  it  eas¬ 
ier  to  understand  the  behavior  of  the  IFOR  com¬ 
manders,  and  allows  human  communication  with 
the  IFOR  commanders.  In  order  to  communicate 
with  other  CGFs  we  will  be  adopting  CCSIL  pro- 


tocols  (Salisbury,  1995). 

4.4,  Intelligence 

The  most  important  responsibility  of  an  air  con¬ 
troller  is  to  locate,  identify,  and  track  threats. 
“Timely  interception  is  totally  dependent  of  two 
factors:  early  detection  and  positive  identifica¬ 
tion”  (Gunston  &  Spick,  1983).  The  need  to  track 
the  threat  arises  because  enemy  agents  are  emi¬ 
nently  uncooperative.  Some  early  failures  of  our 
fighter  agents  acting  alone  arose  because  human 
pilots  would  feign  an  attack  from  one  direction, 
then  beam  or  drop  and  atteick  from  a  different  di¬ 
rection.  The  more  powerful  radar  capabilities  of 
the  AIC  and  GCI  makes  our  agents  less  vulnerable 
to  these  tactics. 

Each  agent  has  limited  capability.  Controllers  are 
limited  by  weapons,^  maneuverability,  and  speed 
when  compared  with  the  targets  they  must  defend 
against.  To  compensate  for  this  lack  of  ability 
they  provide  greater  situational  awareness  either 
through  proxirriity  (e.g.,  a  FAC)  or  superior  equip¬ 
ment  (such  as  an  E-2’s  radar).  They  must  use 
this  awareness  to  perform  continuous  intelligence 
gathering.  Without  this  information  even  a  vet¬ 
eran  pilot  may  be  defeated  by  a  poorly  equipped 
pilot  of  lesser  training. 

5,  Example  scenario 

Figures  2  through  8  illustrate  some  of  the  interac¬ 
tion  between  command  agents  and  combat  aircraft 
during  a  close-air  support  mission.  All  of  this  di¬ 
alog  is  taken  from  a  simulation  run  of  a  close  air 
support  mission. 

Our  agents  include  a  section  of  F-14d  fighters 
(lead  by  Falcon  14),  a  TACC  (Icepack),  an  FSCC 
(Bronco),  a  DASC  (Mustang)  and  a  FAC  (Rat¬ 
tler).  Each  utterance  is  preceded  by  the  name  of 
the  speaker  and  the  radio  frequency  used  for  this 
communication.  The  frequencies  are  color  coded 
to  match  the  encryption  scheme  used  in  the  com¬ 
munication. 

In  figure  2  the  two  planes  check  into  the  amphibi¬ 
ous  operations  area  (AOA)  with  Icepack.  The  ex¬ 
act  form  of  the  plane’s  initial  check-in  message  is 
specified  in  the  SPINs  (SPecial  INstructions)  and 
may  vary  across  scenarios,  but  will  convey  the  es- 

^Thoiigh,  at  least  one  E~2  pilot  considers  every  friendly 
plane  in  the  sky  his  weapon. 


Falconl4  (white) :  Icepack  this -is  Falconl4 

Icepack  (white):  go-ahead 

Falconl4  (white) :  Falconl4 

Falconl4  (white) :  mission-number  20-069 

Falconl4  (white) :  proceeding-to  Elmer 

Falconl4  (white) :  angels  32 

Falconl4  (white) ;  time-on-station  1+30 

Falconl4  (white) :  check ing-in-as  fragged 

Icepack  (white):  roger 

Icepack  (white) :  Falconl4 

Icepack  (white):  radar-contact 

Icepack  (white):  cleared-to-enter-aoa 

Icepack  (white) :  proceed-as-brief ed 

Icepack  (white):  maintain  angels  32 

Icepack  (white):  check-in-with  Mustang 

Icepack  (white):  on  orange 

Icepack  (white):  at  Tiger 

F2Llconl4  (white):  wilco 

Figure  2:  Mission  checks  in  to  AOA 

sential  information  1)  who  I  am,  2)  where  I  am, 
and  3)  what  am  I  doing  here. 

Icepack  recognizes  this  message  and  realizes  that 
they  are  both  friendly  and  supposed  to  be  there. 
Icepack  locates  their  corresponding  blip  on  radar, 
gives  them  permission  to  enter  the  AOA,  and  does 
not  change  their  mission. 

Our  TACC  is  capable  of  some  low  level  air  traffic 
control.  In  this  case  it  consists  of  assigning  unique, 
even  altitudes  to  inbound  flights,  while  outbound 
flights  are  expected  to  maintain  odd  altitudes. 

Fincilly,  Icepack  hands  off  control  to  the  next 
agent,  Mustang,  at  a  pre-briefed  radio  setting. 

Rattler  (silver):  Bronco  this-is  Rattler 
Rattler  (silver) :  immediate-mission 
Rattler  (silver):  target-is  tank 
Rattler  (silver):  target -location- is 
Rattler  (silver):  x  127000 
Rattler  (silver):  y  27600 
Rattler  (silver):  target-time  ASAP 
Rattler  (silver) :  desired-results  destroy 
Rattler  (silver):  final-control  FAC  Rattler 
Rattler  (silver):  on  green 
Bronco  (silver) :  roger  Rattler 

Figure  3:  FAC  sends  tactical  air  request  to  FSCC 
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In  figure  3  Rattler  finds  itself  in  the  line  of  un¬ 
friendly  fire  and  radios  back  to  the  FSCC  that  it 
needs  support  immediately.  In  addition  it  provides 
information  sufficient  for  the  FSCC  to  initiate  a 
Joint  Tactical  Airstrike  Request  (JTAR).^ 

The  JTAR  includes  target  type,  location,  time, 
and  desired  results.  Note  that  Rattler  has  elected 
to  be  the  forward  air  controller  for  the  mission 
and  direct  the  final  bombing  run.  The  FSCC  sup¬ 
plements  this  information  with  coordination  and 
mission  data. 

Bronco  (orange) :  Mustang  this-is  Bronco 
Bronco  (orange) :  request-number  28-59 
Bronco  (orange);  immediate-mission 
Bronco  (orange):  target-is  tank 
Bronco  (orange) :  target-location-is 
Bronco  (orange);  x  127000 
Bronco  (orange) :  y  27500 
Bronco  (orange) :  target-time  ASAP 
Bronco  (orange) :  desired-results  destroy 
Bronco  (orange);  finetl- control  FAC  Rattler 
Bronco  (orange) ;  on  green 
Mustang  (orange);  roger 

Figure  4:  FSCC  radios  DASC 

In  figure  4  Bronco  (the  FSCC)  has  determined 
that  close  air  support  is  the  logical  response, 
and  transmits  the  necessary  information  from  the 
JTAR  to  Mustang  (the  DASC).  If  this  were  more 
realistic,  the  request  would  be  transmitted  in  hard 
copy  form  rather  than  over  the  radio,  but  we  are 
constrained  with  the  information  exchanges  allow¬ 
able  through  ModSAF. 

In  figure  5  the  lead  plane  is  approaching  a  holding 
point  and  checks  in  with  Mustang.  The  plane’s 
check  in  sequence  has  the  same  form  as  seen  in 
figure  2. 

At  this  stage  Mustang  alters  the  mission  from  its 
pre-specified  course.  Even  though  the  planes  have 
a  pre-briefed  mission,  Mustang  determines  that 
the  new  mission  is  more  important  and  redirects 
the  flight  to  a  new  contact  point  (Chevy)  and  a 
new  controller  (Rattler)  for  further  details. 


®WeVe  elected  not  to  include  an  example  of  a  Joint 
Tactical  Airstrike  Request  because  of  its  detailed  nature. 
The  nine/ twelve  line  brief  of  figure  1  accounts  for  less  than 
one  sixth  of  its  content  by  size. 


Falconl4  (orange) :  this-is  Falconl4 
Mustang  (orange) ;  go-ahead 
Falconl4  (orange) ;  F£Llconl4 
Falconl4  (orange) :  mission-number  20-059 
Falconl4  (orange) ;  proceeding-to  Tiger 
Falconl4  (orange) ;  angels  32 
FcQ.conl4  (orange) ;  time-on-station  1+30 
Falconl4  (orange):  checking- in-as  fragged 
Mustang  (orange) ;  Falconl4  this-is  Mustang 
Mustang  (orange) :  proceed-as -brief ed 
Mustang  (orange) :  check- in-with  Rattler 
Mustang  (orange) ;  on  green  at  Chevy 
Falconl4  (orange) :  wilco 

Figure  5:  Mission  checks  in  with  DASC 

Mustang  (green):  Rattler  this-is  Mustang 

Rattler  (green) :  go-ahead 

Mustang  (green) ;  expect-cas-mission  20-059 

Mustang  (green) ;  ceill-sign  Falconl4 

Mustang  (green);  at  Chevy 

Rattler  (green) :  roger 

Figure  6:  DASC  contacts  FAC 

Figure  6  shows  Mustang  informing  Rattler  that 
help  is  on  the  way,  who  they  are,  and  where  to 
expect  them.  Rattler  has  no  radar  and  will  as¬ 
sume  a  plane  approaching  from  that  direction  is 
the  expected  mission. 

In  figure  7  the  planes  finally  arrive  at  the  contact 
point  for  Rattler  and  check  in  according  to  the 
format  seen  in  figure  2, 

Figure  8  shows  Rattler  delivering  a  nine  line  brief 
similar  to  that  shown  in  figure  1.  This  is  an  in¬ 
formation  intensive  message  which  relies  on  the 
controller  and  pilot  sharing  a  common  communi¬ 
cation  model.  All  and  only  the  necessary  values 
are  given  sequentially  without  reference  to  mean¬ 
ing  or  line  numbers. 

What’s  being  expressed  here  is  that  the  initial 
point  will  be  Joyce.  The  heading,  in  magnetic 
degrees,  from  the  initial  point  to  the  target  is  052. 
The  distance  from  the  initial  point  to  the  target 
is  18.6  nautical  miles.  The  target’s  elevation  is  0 
above  mean  sea  level.  The  target’s  description  is 
a  ^‘tank”.  The  target’s  coordinates  are  127000  by 
27500  in  the  X/Y  coordinate  system  of  ModSAF. 
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Falconl4  (green) 
Rattler  (green) : 
Falconl4  (green) 
Falconl4  (green) 
Falconl4  (green) 
Falconl4  (green) 
Falconl4  (green) 
Falconl4  (green) 
Falconl4  (green) 
Falconl4  (green) 
Rattler  (green): 
Rattler  (green) : 


:  Rattler  this-is  F2Llconl4 
go-ahead 
:  Falconl4 

:  mission-nuittber  20-059 
:  2  F-14d 

:  holding-at  Chevy 
:  angels  32 
:  10  MK82 

:  time-on-station  1+30 
:  no-laser-capability 
roger 
Falconl4 


Figure  7:  Mission  check  in  with  FAC 


Rattler  (green): 
Rattler  (green) : 
Falconl4  (green) 
Rattler  (green): 
Rattler  (green): 
Rattler  (green): 
Rattler  (green): 
Rattler  (green): 
Rattler  (green) : 
Rattler  (green): 
Rattler  (green) : 
Rattler  (green): 
Rattler  (green): 
FaLlconl4  (green) 


standing-by 
with-9-line-brief 
:  ready-to-copy 
Joyce 
052 
18.6 
0 

taTik 

X  127000  y  27500 
wp 

SH  8000  meters 

Ford 

tot  ASAP 


:  ASAP 


Figure  8:  FAC  gives  9  line  brief 


The  target  will  be  marked  with  white  phosphor."* 
There  .are  friendlies  in  the  cirea  which  are  8000 
meters  to  the  south-west.  After  the  attack  the 
plane  should  egress  through  Ford.  And  the  attack 
should  commence  as  soon  as  possible. 

Falcon  14  signals  that  he  copies  all  of  that  infor¬ 
mation  and  agrees  to  it  by  repeating  the  time. 

Following  this,  there  are  brief  exchanges  when  the 
planes  are  spotted,  cleared  to  drop,  and  for  dam¬ 
age  assessment. 


From  a  broader  artificial  intelligence  perspective, 
C^I  presents  interesting  problems  in  reactive  plan¬ 
ning  and  managing  dynamically  changing  goals  in 
the  face  of  uncertainty.  The  battle  field  environ¬ 
ment  is  constantly  changing.  This  requires  a  fast 
and  efficient  architecture  to  keep  up  with  the  speed 
requirements  of  the  situation  as  well  as  a  flexible 
architecture  for  incremental  reasoning  and  reac¬ 
tive  planning. 

Most  of  the  planning  currently  done  by  our  sys¬ 
tem  is  reactive  planning.  In  some  situations  the 
C^I  agents  may  have  some  time  for  decision  mak¬ 
ing  and  should  use  this  time  for  more  deliberate 
planning.  Recent  research  explores  the  possibility 
of  incorporating  planning  and  means-ends  analy¬ 
sis  mechanisms  with  our  agents  (van  Lent,  1995; 
Wray,  1995). 

This  work  is  very  closely  related  to  distributed  ar¬ 
tificial  intelligence.  Since  we  are  basing  our  work 
on  an  existing  model  which  seems  to  work  reason¬ 
ably  well,  we  can  avoid  many  of  the  problems  of 
distributed  artificial  intelligence  systems.  For  ex¬ 
ample,  our  agents  need  not  carry  out  protracted 
negotiations. 

WeVe  demonstrated  that  a  template  driven  ap¬ 
proach  to  language  understanding  provides  a  suf¬ 
ficiently  flexible  command  language  for  many  as¬ 
pects  of  communication,  but  it’s  not  clear  how  far 
this  approach  can  be  extended.  More  work  needs 
to  be  done  on  natural  language  imderstand  both 
for  agent  flexibility  and  ease  of  use  in  human  com¬ 
puter  interaction  See  (Lehman  ei  aL,  1995)  for  re¬ 
cent  work. 

Though  these  agents  were  prepared  to  take  part 
in  the  STOW-E  demonstration,  during  rehearsal 
they  were  unable  to  handle  the  large  number  of 
other  agents  they  saw  in  the  world  and  crashed. 
This  turned  out  to  be  a  buffer  overflow  problem, 
but  suggested  several  methods  for  reorganizing  the 
way  of  IFOR  agents  handle  large  numbers  of  in¬ 
puts.  Currently,  these  IFOR  agents  will  slow  down 
and  their  performance  will  degrade  as  the  number 
of  other  agents  they  have  to  consider  increases. 


6.  Research  Issues  in  C^I 

The  development  of  C^I  agents  presents  several 
interesting  research  issues. 


In  the  immediate  future  we  will  address  more  mun¬ 
dane,  but  no  less  critical  tasks  of  tracking  fuel 
states  and  allocating  fuel  assets. 


“^The  capability  for  mcurking  a  target  does  not  yet  exist. 


m 
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7,  Discussion 

We  have  described  the  current  state  of  devel¬ 
opment  of  C^I  agents  used  by  Soar/IFOR.  We 
have  shown  how  the  agents  currently  implemented 
demonstrate  the  specific  aspects  of  the  C^I  do¬ 
main.  Finally,  we  worked  through  an  example 
which  showed  multiple  control  agents  interacting 
with  planes  on  a  close  air  support  mission. 

We  have  demonstrated  an  ability  to  cope  with 
incomplete  knowledge  and  incrementally  supple¬ 
ment  information  as  it  becomes  available.  This 
requires  continuous  situation  assessment:  com¬ 
mands,  threats,  and  resources  may  arrive  at  any 
time. 

We  believe  that  automation  must  be  pushed  up 
the  command  hierarchy.  As  the  number  of  simu¬ 
lated  agents  grows,  people  will  have  to  supervise 
larger  numbers  of  agents.  We  believe  that  the  best 
way  to  do  this  is  to  emulate  the  present  military 
command  hierarchy.  This  has  the  advantage  of 
ease  of  use  (nothing  new  to  learn),  effectiveness 
(it  has  been  proven  through  centuries  of  warfare), 
and  ease  of  understanding. 
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Abstract 

Agent  tracking  is  an  important  capability  an  iii' 
telligent  agent  requires  for  interacting  with  other 
agents.  It  involves  monitoring  the  observable  ac¬ 
tions  of  other  agents  as  well  as  inferring  their  un¬ 
observed  actions  or  high-level  goals  and  behav¬ 
iors.  This  paper  focuses  on  a  key  challenge  for 
agent  tracking:  recursive  tracking  of  individuals 
or  groups  of  agents.  The  paper  first  introduces 
an  approach  for  tracking  recursive  agent  mod¬ 
els.  To  tame  the  resultant  growth  in  the  track¬ 
ing  effort  and  aid  real-time  performance,  the  pa¬ 
per  then  presents  model  sharing^  an  optimization 
that  involves  sharing  the  effort  of  tracking  mul¬ 
tiple  models.  Such  shared  models  are  dynami¬ 
cally  unshared  as  needed  —  in  effect,  a  model 
is  selectively  tracked  if  it  is  dissimilar  enough  to 
require  unsharing.  The  paper  also  discusses  the 
application  of  recursive  modeling  in  service  of  de¬ 
ception,  and  the  impact  of  sensor  imperfections. 

This  investigation  is  based  on  our  on-going  effort 
to  buOd  intelligent  pilot  agents  for  a  real-world 
synthetic  air-combat  environment.^ 

1  Introduction 

In  dynamic,  multi-agent  environments,  an  intelligent 
agent  often  needs  to  interact  with  other  agents  to 
achieve  its  goals.  Agent  tracking  is  an  important  re¬ 
quirement  for  intelligent  interaction.  It  involves  moni¬ 
toring  other  agents’  observable  actions  as  well  as  infer¬ 
ring  their  unobserved  actions  or  high-level  goals,  plans 
and  behaviors. 

Agent  tracking  is  closely  related  to  plan  recogni- 
tion(Kautz  &  Allen  1986;  Azarewicz  et  aL  1986), 
whidi  involves  recognizing  agents’  plans  based  on  ob¬ 
servations  of  their  actions.  The  key  difference  is  that 

thank  Paul  Rosenbloom  and  Ben  Smith  for  detailed 
feedback  on  this  effort.  Thanks  also  to  Lewis  Johnson,  Pi- 
otr  Gmytrasiewicz  and  the  anonymous  reviewers  for  help¬ 
ful  comments.  This  research  was  supported  under  sub¬ 
contract  to  the  University  of  Southern  California  Informa¬ 
tion  Sciences  Institute  from  the  University  of  Michigan,  as 
part  of  contract  N00014-92-K-2015  from  the  Advanced  Sys¬ 
tems  Technology  Office  of  the  Advanced  Research  Projects 
Agency  and  the  Naval  Research  Laboratory. 


plan-recognition  efforts  typically  focus  on  tracking  a 
narrower  (plan-based)  class  of  agent  behaviors,  as  seen 
in  static,  single-agent  domains.  Agent  tracking,  in  con¬ 
trast,  can  involve  tracking  a  broader  mix  of  goal-driven 
and  reactive  behaviors(Tambe  &  Rosenbloom  1995). 
This  capability  is  important  for  dynamic  environments 
where  agents  do  not  rigidly  follow  plans. 

This  paper  focuses  on  the  issues  of  recursive  agent 
and  agent-group  tracking.  Our  investigation  is  based 
on  an  on-going  effort  to  build  intelligent  pilot  agents  for 
simulated  air-combat(Tambe  et  aL  1995).  These  pilot 
agents  execute  missions  in  a  simulation  environment 
called  ModSAF,  that  is  being  commercially  developed 
for  the  military (Calder  et  al  1993).  ModSAF  provides 
a  synthetic  yet  real-world  setting  for  studying  a  broad 
range  of  challenging  issues  in  agent  tracking.  By  in¬ 
vestigating  agents  that  are  successful  at  agent  tracking 
in  this  environment,  we  hope  to  extract  some  general 
lessons  that  could  conceivably  be  applied  in  other  syn¬ 
thetic  or  robotic  multi-agent  environments(Kuniyoshi 
et  al  1994;  Bates,  Loyall,  &  Reilly  1992). 

For  an  illustrative  example  of  agent  tracking  in  the 
air-combat  simulation  environment,  consider  the  sce¬ 
nario  in  Figure  1.  The  pilot  agent  L  in  the  light-shaded 
aircraft  is  engaged  in  combat  with  pilot  agents  D  and 
E  in  the  dark-shaded  aircraft.  Since  the  aircraft  are  far 
apart,  L  can  only  see  its  opponents’  actions  on  radar 
(and  vice  versa).  In  Figure  1-a,  L  observes  its  oppo¬ 
nents  turning  their  aircraft  in  a  coordinated  fashion  to 
a  collision  course  heading  (i.e.,  with  this  heading,  they 
will  collide  with  L  at  the  point  shown  by  x).  Since  the 
collision  course  maneuver  is  often  used  to  approach 
one’s  opponent,  L  infers  that  its  opponents  are  aware 
of  its  (L’s)  presence,  and  are  trying  to  get  closer  to  fire 
their  missiles.  However,  L  has  a  missile  with  a  longer 
range,  so  L  reaches  its  missile  range  first.  L  then  turns 
its  aircraft  to  point  straight  at  D’s  aircraft  and  fires  a 
radar-guided  missile  at  D  (Figure  1-b).  Subsequently, 
L  executes  a  35^  fpole  turn  away  from  D’s  aircraft  (Fig¬ 
ure  1-c),  to  provide  radar  guidance  to  its  missile,  while 
slowing  its  rate  of  approach  to  enemy  aircraft. 

While  neither  D  nor  E  can  observe  this  missile  on 
their  radar,  they  do  observe  L’s  pointing  turn  followed 


Figure  1:  Pilot  agents  D  and  E  are  attacking  L.  An 
arc  on  an  aircraft’s  nose  shows  its  turn  direction. 


by  its  fpole  turn.  They  track  these  to  be  part  of 
L’s  missile  firing  behavior,  and  infer  a  missile  firing. 
Therefore,  they  attempt  to  evade  this  missile  by  exe¬ 
cuting  a  90^  beam  turn  (Figure  1-d).  This  causes  their 
aircraft  to  become  invisible  to  L’s  radar.  Deprived 
of  radar  guidance,  L’s  missile  is  rendered  harmless. 
Meanwhile,  L  tracks  its  opponents’  coordinated  beam 
turn  in  Figure  1-d,  and  prepares  counter-measures  in 
anticipation  of  the  likely  loss  of  its  missile  and  radar 
contact. 

Thus,  the  pilot  agents  need  to  continually  engage  in 
agent  tracking.  They  need  to  track  their  opponents’ 
actions,  such  as  turns,  and  infer  unobserved  actions 
and  high  level  goeils  and  behaviors,  such  as  the  fpole, 
beam  or  missile  firing  behaviors.  This  paper  focuses  on 
two  key  issues  in  agent  tracking  in  this  environment: 

•  Recursive  agent  tracking:  PUot  agents  continually 
influence  each  other’s  behaviors,  creating  a  need  for 
recursive  tracking.  For  instance,  in  Figure  1-d,  to 
successfully  track  D’s  beam,  L  must  also  recursively 
track  how  D  is  likely  to  be  tracking  L’s  own  actions 
—  that  D  is  aware  of  L’s  missile  firing,  and  it  is 
beaming  in  response.  Such  recursive  tracking  may 
also  be  used  in  service  of  deception,  and  in  address¬ 
ing  other  agents’  realistic  sensor  (radar)  limitations. 

•  Agent  group  tracking:  An  agent  may  need  to  track 
coordipated  (or  uncoordinated)  activities  of  a  group 
of  agents,  e.g.,  as  just  seen,  L  needed  to  track  two 
coordinated  opponents. 

To  address  these  issues,  this  paper  first  presents  an 
approach  for  recursive  tracking  of  an  individual  or  a 
groups  of  agents.  This  approach  builds  upon  RESC, 
a  technique  for  real-time  tracking  of  flexible  and  re¬ 
active  behaviors  of  individual  agents  in  dynamic  en¬ 
vironments.  RESC  is  a  real-time,  reactive  version 
of  the  model  tracing  technique  used  in  intelligent  tu¬ 
toring  systems  —  it  involves  executing  a  model  of 
the  tracked  agent,  and  matching  predictions  with  ac¬ 
tual  observations(Anderson  et  al  1990;  Ward  1991; 
Hill  &  Johnson  1994). 

Unfortunately,  recursive  agent-group  tracking  leads 
to  a  large  growth  in  the  number  of  models.  Executing 
all  of  these  models  would  be  in  general  highly  prob¬ 


lematic.  The  problem  is  particularly  severe  for  a  pi¬ 
lot  agent,  given  that  it  has  to  track  opponents’  ma¬ 
neuvers  and  counter  them  in  rezJ-time,  e.g.,  by  going 
beam  to  evade  a  missile  fired  at  it.  Thus,  for  execut¬ 
ing  recursive  models  (and  for  a  practical  investigation 
of  recursive  tracking),  optimizations  for  real-time  per¬ 
formance  are  critical.  Previous  work  on  optimizations 
for  agent  tracking  has  mostly  focused  on  intra-model 
(within  a  single  model)  optimizations,  e.g.,  heuristic 
pruning  of  irrelevant  operators(Ward  1991)  restricted 
backtrack  search(Tambe  &  Rosenbloom  1995),  and  ab- 
straction(Hill  &  Johnson  1994).  In  contrast,  this  pa¬ 
per  proposes  inter-model  (across  multiple  models)  op¬ 
timizations.  It  introduces  an  inter-model  optimization 
called  model  sharing,  which  involves  sharing  the  effort 
of  tracking  multiple  models.  Shared  models  are  dy¬ 
namically  unshared  when  required.  In  essence,  a  model 
is  selectively  tracked  if  it  is  dissimilar  enough  to  war¬ 
rant  unsharing.  The  paper  subsequently  discusses  the 
application  of  recursive  models  in  service  of  deception. 
This  analysis  is  followed  up  with  some  supportive  ex¬ 
periments. 

The  descriptions  in  this  paper  assume  the  perspec¬ 
tive  of  the  automated  pilot  agent  L,  as  it  tracks  its 
opponents.  They  also  assume  ideal  sensor  conditions, 
where  agents  can  perfectly  sense  each  others’  maneu¬ 
vers,  unless  otherwise  mentioned.  Furthermore,  the  de¬ 
scriptions  are  provided  in  concrete  terms  using  imple¬ 
mentations  of  a  pilot  agent  in  a  system  called  TacAir- 
Soar(Tambe  et  al  1995),  built  using  the  Soar  archi- 
tecture(Newell  1990;  Rosenbloom  et  aL  1991).  We 
assume  some  familiarity  with  Soar’s  problem-solving 
model,  which  involves  applying  operators  to  states  to 
reach  a  desired  state. 

2  Recursive  Agent  Tracking 

One  key  idea  in  RESC  is  the  uniform  treatment  of 
an  agent’s  generation  of  its  own  behavior  and  tracking 
of  other  agent’s  behaviors.  As  a  result,  the  combi¬ 
nation  of  architectural  features  that  enable  an  agent 
to  generate  flexible  goal-driven  and  reactive  behaviors 
are  reused  for  tracking  others’  flexible  and  reactive  be¬ 
haviors.  This  uniformity  is  extended  in  this  section  in 
service  of  recursive  agent  tracking. 

To  illustrate  this  idea,  we  first  describe  L’s  genera¬ 
tion  of  its  own  behaviors,  using  the  situation  in  Figure 
1-d,  just  before  the  agents  lose  radar  contact  with  each 
other.  Figure  2-a  illustrates  L’s  operator  hierarchy 
when  executing  its  fpole.  Here,  at  the  top-most  level,  L 
is  executing  its  mission  —  to  defend  against  intruders 
—  via  the  execute-mission  operator.  Since  the  termi¬ 
nation  condition  of  this  operator  —  completion  of  L’s 
mission  —  is  not  yet  achieved,  a  subgoaJ  is  generated.^ 

^If  an  operator’s  termination  conditions  remain  unsatis¬ 
fied,  a  subgoal  gets  created.  If  these  termination  conditions 
are  satisfied  by  future  state  changes,  then  the  operator  and 
all  its  subgoals  are  terminated. 
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Different  operators  are  available  in  this  subgoal,  such 
as  foUow-flighUpaihy  intercept^  and  run-away,  L  selects 
the  intercepi  operator  to  combat  its  opponent  D.  In 
service  of  intercepi^  L  applies  the  employ-missile  op¬ 
erator  in  the  next  subgod.  Since  a  missile  has  been 
fired,  the  fpole  operator  is  selected  in  the  next  sub¬ 
goal  to  guide  the  missile  with  radar.  In  the  final  sub¬ 
goal  mainiain-heading  is  applied,  causing  L  to  main¬ 
tain  heading  (Figiire  1-d).  AU  these  operators,  used  for 
generating  L’s  own  actions,  will  be  denoted  with  the 
subscript  L,  e.g.,  fpolei^.  Operatorj^  will  denote  an 
arbitrary  operator  of  L.  Statej^  will  denote  the  global 
state  shared  by  all  these  operators.  Together,  statej^ 
and  the  operator^  hierarchy  constitute  L’s  model  of 
its  present  dynamic  self,  referred  to  as  model^. 


Figure  2:  (a)  Modelj^;  (b)  Modelj^D;  (c)  Modelj^jjj^. 


Model^  supports  L’s  flexible/reactive  behaviors, 
given  Soar’s  architectural  apparatus  for  operator  se¬ 
lection  and  termination(Rosenbloom  ei  aL  1991).  L 
reuses  this  apparatus  in  tracking  its  opponents’  be¬ 
haviors.  Thus,  L  uses  a  hierarchy  such  as  the  one  in 
Figure  2-b  to  track  D’s  behaviors.  Here,  the  hierarchy 
represents  L’s  model  of  D ’s  current  operators  in  the  sit¬ 
uation  in  Figure  1-d.  These  operators  are  denoted  with 
the  subscript  LD.  This  operator^Q  hierarchy,  and  the 
stateT|D  that  goes  with  it,  constitute  L’s  model  of  D  or 
moddj^jj.  Modelrjp  obviously  cannot  and  does  not 
directly  influence  Ers  actual  behavior,  it  only  tracks 
D’s  behavior.  For  instance,  in  the  final  subgoal,  L 
applies  the  siari-&-mainiain-inrnTjy  operator,  which 
does  not  cause  D  to  turn.  Instead,  this  operator  pre¬ 
dicts  D’s  action  and  matches  the  prediction  with  D’s 
actual  action.  Thus,  if  D  starts  turning  right  towards 
beam,  then  there  is  a  match  with  model^jj  —  L  be¬ 
lieves  that  D  is  turning  right  to  beam  and  evade  its 
missile,  as  indicated  by  other  higher-level  operators  in 
the  operatorj^j^  hierarchy.  Note  that,  in  r^ity,  from 
L’s  perspective,  there  is  some  ambiguity  in  D’s  right 
turn  in  Figure  1-d  —  it  could  be  part  of  a  big  150^ 
turn  to  run  away  given  L’s  missile  firing.  To  resolve 
such  ambiguity,  L  adopts  several  techniques,  such  as 
assuming  the  worst-case  hypothesis  about  its  enemy, 
which  in  this  case  is  that  D  is  beaming  rather  than 


running  away.  We  will  not  discuss  RESC’s  ambigu¬ 
ity  resolution  any  further  in  this  paper  (see  (Tambe  &: 
Rosenbloom  1995)  for  more  details). 

Thus,  with  the  RESC  approach,  L  tracks  D’s  be¬ 
haviors  by  continuously  executing  the  operatorj^D  hi¬ 
erarchy,  and  matching  it  against  D’s  actions.  To 
recursively  track  its  own  actions  from  D’s  perspec¬ 
tive,  L  may  apply  the  same  technique  to  its  recursive 
modelujT  (L’s  model  of  D’s  model  of  L)  as  shown 
in  Figure  2-c.  Modelj^jj  consists  of  an 
hierarchy  and  statej^j^^.  The  important  point  here  is 
the  uniform  treatment  of  the  operator^jjT  hierarchy 

—  on  par  with  operatorj^jj  and  operatorr  nierarchies 

—  to  support  the  tracking  of  flexible  ana  reactive  be¬ 
haviors.  L  tracks  model^jjj^  by  matching  predictions 
with  its  own  actions.  Further  recursive  nesting  leads 
to  the  tracking  of  modelj^j^m  and  so  on.  To  track 
additional  opponents,  e.g.,  the  second  opponent  E,  L 
tracks  additional  models,  such  as  modelj^jj.  L  may 
also  track  m^delj^gj^,  model jjjjjj,  model etc  for 
recursive  tracking. 

Recursive  tracking  is  key  to  tracking  other  agents’ 
behaviors  in  interactive  situations.  Thus,  it  is  L’s  re¬ 
cursive  tracking  of  fpolejj^r  which  indicates  a  mis¬ 
sile  firing  to  models and  causes  the  selection  of 
evade-missikj^jj  to  track  D’s  missile  evasion.  Note 
that  in  Figure  2-c,  ambiguity  resolution  in  model£pT 
leads  to  an  operator^pr  hierarchy  that  is  identic^ 
to  the  operatorj^  hierarchy.  One  key  ambiguity  reso¬ 
lution  strategy  is  again  the  worst-case  assumption  — 
given  ideal  sensor  situations,  L  assumes  D  can  accu¬ 
rately  track  L’s  behaviors.  Thus,  among  possible  op¬ 
tions  in  the  operatorj^pr  hierarchy,  the  one  identical 
to  operator^  gets  selected.  However,  these  hierarchies 
may  not  always  be  identical  and  the  differences  be¬ 
tween  them  may  be  exploited  in  service  of  deception 
at  least  in  adversarial  situations.  These  possibilities 
are  discussed  in  more  detail  in  Section  5.1. 

3  Executing  Models  in  Real-time 
Unfortunately,  the  recursive  tracking'  scheme  intro¬ 
duced  in  the  previous  section  points  to  an  exponential 
growth  in  the  number  of  models  to  be  executed.  In 
general,  for  iV' opponents,  and  r  levels  of  nesting  (mea¬ 
sured  with  r  =  1  for  modelj^,  r  =  2  for  modelpp, 
and  so  on),  the  pilot  agent  L  may  need  to  execute: 

models  (which  is  r  for  =  1,  but  for 
N  >  1).  This  is  clearly  problematic  given  the  likely 
scale-up  in  N.  In  particular,  given  its  limited  computa¬ 
tional  resources,  L  may  be  unable  to  execute  relevant 
operators  from  all  its  models  in  real-time,  jeopardizing 
its  survival.  In  fact,  as  seen  in  Section  6,  L  may  run 
into  resource  contention  problems  while  executing  just 
five  models  —  indicating  possible  difficulties  even  for 
small  N  and  r . 

Thus,  optimizations  involving  some  form  of  selec¬ 
tive  tracking  appear  necessary  for  real-time  execution 
of  these  models.  Yet,  such  selectivity  should  not  cause 
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an  agent  to  be  completely  ignorant  of  critical  informa¬ 
tion  that  a  model  may  provide  (e,g.,  an  agent  should 
not  be  ignorant  of  an  opponent's  missile  firing).  To 
this  end,  this  paper  focuses  on  an  optimization  called 
model  sharing.  The  overall  motivation  is  that  if  there 
is  a  modely  that  is  near-identical  to  a  modelx,  then 
modely ’s  states  and  operators  can  be  shared  with  those 
of  modelx.  Thus,  modely  is  tracked  via  the  execution 
of  modelx,  reducing  the  tracking  effort  in  half.  Modely 
may  be  dynamically  unshared  from  modelx  if  it  grows 
significantly  dissimilar.  Thus,  a  model  is  selectively 
executed  based  on  its  dissimilarity  with  other  models. 

For  an  illustration  of  this  optimization,  consider 
modely  and  modeljLDL,  as  shown  in  Fi^re  2. 
The  operator^jjT  hiereirdiy  can  be  shared  with  the 
operatorj^  hierardiy  since  the  two  are  identical.  (In 
low-level  implementation  terms,  sharing  an  operator^ 
involves  adding  a  pointer  indicating  it  is  also  a  part  of 
modelLi3l^),  Furthermore,  information  in  state^jjL 
is  shared  with  statej^.  Thus,  L  essentially  executes 
operators  from  only  one  model,  instead  of  two. 

Given  the  eflSciency  benefits  from  sharing,  it  is  often 
useful  to  abstract  away  from  some  of  the  differences 
between  models  in  order  to  enable  sharing.  However, 
such  abstraction  may  not  be  possible  for  some  static 
and/or  dynamic  aspects  of  the  models.  One  important 
aspect  relates  to  private  information.  In  particular,  in 
their  unshared  incarnations,  models  have  their  indices 
organized  so  as  to  prevent  a  breach  of  privacy,  e.g., 
modelj^j)  can  access  information  in  niodelj^j^j^,  but 
not  modelj^.  Model  sharing  could  potentially  breach 
such  privacy.  Thus,  for  instance,  if  stater  maintains 
secret  missile  information,  sharing  it  with  state^j^j;^ 
would  allow  modelr  j)  to  access  that  secret.  To  pre¬ 
vent  model  sharing Trom  breaching  such  privacy,  some 
aspects  of  the  shared  models  may  be  explicitly  main¬ 
tained  in  an  unshared  fashion.  Thus,  if  L’s  missile 
range  is  (a  secret)  30  miles,  but  L  believes  D  believes 
it  is  50  miles,  then  the  missile  range  is  maintained  sepa¬ 
rately  on  statej^  and  stater  Figure  3  shows  the  re¬ 
sulting  shared  models  with  an  unshared  missile  range. 

Such  sharing  among  models  is  related  to  the  shar¬ 
ing  of  belief  spaces  in  the  SNePS  belief  repr^ntation 
system(Shapiro  &  Rapaport  1991).  One  key  difference 
is  dynamic  model  unsharing.  In  particular,  while  some 
of  aspects  of  the  models  are  static  (e.g.,  the  statically 
unshared  missile  ranges  above),  other  aspects,  partic¬ 
ularly  those  relating  to  operators,  are  highly  dynamic. 
As  a  result,  shared  components  may  need  to  be  dy¬ 
namically  unshared  when  dissimilar.  Ideally,  any  two 
models  could  be  merged  (shared)  when  they  are  near- 
identical,  and  dynamically  unshared  in  case  of  differ¬ 
ences.  This  would  be  ideal  selectivity  —  a  model  is 
tracked  if  it  requires  unsharing.  However,  in  practice, 
both  unsharing  and  merging  may  involve  overheads. 
Thus,  if  an  agent  greedily  attempts  to  share  any  two 
models  whenever  they  appear  near-identical,  it  could 
face  very  heavy  overheads.  Instead,  it  has  to  selec- 
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Figure  3:  Sharing  modely  with  model£j)£. 

tively  share  two  models  over  a  time  period  A  so  that 
the  savings  from  sharing  outweigh  the  cost  of  dynamic 
unshauring  and  re-merging  during  A.  In  particular,  sup¬ 
pose  there  are  two  models,  modelx  and  modely  that 
are  unshared  over  n  sub-intervals  of  A, 

but  shared  during  the  rest  of  A.  Further,  suppose 
cost^(Af)  is  the  cost  of  executing  a  model  M  over  a 
time  interval  0;  cost  (unshare)  and  cost  (merge)  are  the 
overheads  of  unsharing  and  merging  respectively;  and 
cost^  (detect)  is  the  cost  incurred  during  0  of  deciding 
if  shared  models  need  to  be  unshared  or  if  unshared 
models  can  be  merged  (this  may  potentially  involve 
comparing  two  different  models  and  deciding  if  shar¬ 
ing  is  cost-effective).  Then,  in  sharing  modely  with 
modelx  during  A,  benefits  outweight  costs  iff: 

cost A(tno<icly cosii{modtly  )> 
nxco$i(unshare)'^nxcost(rnerge)^cosi  ^(detect) . (1) 

While,  ideally,  agents  may  themselves  evaluate  this 
equation,  our  agents  are  unable  to  do  so  at  present. 
Therefore,  candidate  categories  of  models  —  with  high 
likelihood  of  sharing  benefits  outweighing  costs  —  are 
supplied  by  hand.  Nonetheless,  agents  do  determine 
spe^c  models  within  these  categories  that  may  be 
shared,  and  implement  the  actual  sharing  and  dynamic 
unsharing.  The  categories  are: 

1.  Models  of  distinct  agents  at  the  recursive  depth  of 
r  =  2:  If  a  group  of  agents,  say  D  and  E,  together 
attack  L,  modelr  and  modelj^g  may  be  possibly 
shared.  Thus,  if^all  models  of  its  N  opponents  are 
shared,  L  may  need  track  only,  one  model  at  r  =  2; 
if  not  shared,  L  may  track  N  models  at  r  =  2. 

2.  Recursive  models  of  a  single  agent  at  r  >  3:  For  in¬ 
stance,  modelj^jjL  model^gL  may  be  shared 
with  modely.  Similarly,  model^Dg  may  be  shared 
with  modelr  or  nxodelj^rjr  jj,  etc.  Models  at  re¬ 
cursive  deptt  r  >  3  may  ^  oe  shared  with  models 
at  r  =  1  or  2.  If  all  such  models  are  shared,  L  may 
need  to  track  no  models  at  r  >  3. 

The  end  result  is  that  an  agent  L  may  track  a  group 
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of  N  agents  (with  sharable  models)  with  just  two  mod¬ 
els  —  modelj^  at  r  =  1,  and  one  model  at  r  =  2,  and 
the  rest  are  all  shared  —  instead  of  0{N'^)  models.  If 
the  models  of  N  agents  are  not  sharable,  then  it  may 
still  need  just  N  +1  models,  given  the  sharing  in  the 
recursion  hierarchy.  Thus,  sharing  could  provide  sub¬ 
stantial  benefits  in  tracking  even  for  small  N  and  r.  In 
the  following,  Section  4  examines  in  more  detail  the 
model  sharing  within  a  group,  and  Section  5  examines 
sharing  within  and  across  recursion  hierarchies. 

4  Sharing  in  Agent-Group  IVacking 

Agents  that  are  part  of  a  single  group  often  act  in  a 
coordinated  fashion  —  executing  similar  behaviors  and 
actions  —  and  thus  provide  a  possible  opportunity  for 
model  sharing.  For  instance,  if  D  and  £  are  attacking 
L  in  a  coordinated  fashion,  they  may  fly  in  formation, 
execute  similar  maneuvers  etc.  However,  their  actions 
are  not  perfectly  identical  —  there  are  always  some 
small  delays  in  coordination  for  instance  —  which  can 
be  a  possible  hinderence  in  sharing.  If  the  delays  and 
differences  among  the  agents’  actions  are  small,  they 
need  to  be  abstracted  away,  to  facilitate  model  sharing. 
Yet,  such  abstraction  should  allow  tracking  of  essential 
group  activities. 

To  this  end,  one  key  idea  to  track  an  agent-group 
is  to  track  only  a  single  paradigmatic  agent  within  the 
group.  Models  of  all  other  agents  within  the  group 
are  then  shared  with  the  model  of  this  paradigmatic 
agent.  Thus,  a  whole  group  is  tracked  by  tracking  a 
single  paradigmatic  agent.  For  example,  suppose  L 
determines  one  agent  in  the  attacking  group,  say  D, 
to  be  the  paradigmatic  agent.  It  may  then  only  track 
modelj^j),  and  share  other  models,  such  as  model^jj 
with  modelj^j),  reducing  its  the  tracking  burden. 

Such  model  sharing  needs  to  selective,  if  benefits  are 
to  outweigh  costs.  In  this  case,  the  following  domain- 
specific  heuristics  help  tilt  the  balance  in  favor  of  shar¬ 
ing  by  reducing  the  cost  of  detection,  merging  and  un¬ 
sharing: 

•  Cost(detect):  This  involves  detecting  two  or  more 
agents  (opponents)  to  be  part  of  a  group  with 
sharable  models.  Such  a  group  is  detected  at  low 
cost  by  testing  the  agents’  physical  proximity  and 
direction  of  movement.  If  these  are  within  the 
ranges  provided  by  domain  experts,  the  correspond¬ 
ing  agents’  models  are  shared.  If  the  agents  move 
away  from  each  other  (outside  of  this  range)  their 
models  are  unshared.  Once  outside  this  range,  no 
attempt  is  made  at  model  sharing  —  such  agents 
are  likely  to  be  engaged  in  dissimilar  activities,  and 
even  if  their  models  are  found  to  be  near-identical, 
they  are  likely  to  be  so  for  a  short  time  period. 

#  Cost(merge):  Merging  involves  the  cost  of  selecting 
a  paradigmatic  agent  within  the  group.  It  may  be 
possible  to  select  an  agent  at  random  from  the  group 
for  this  role.  However,  an  agent  in  some  prominent 


position,  such  as  in  front  of  the  group,  is  possibly  a 
better  fit  for  the  role  of  a  paradigmatic  agent,  and 
can  also  be  picked  out  at  a  low  cost.  In  air-combat 
simulation,  an  agent  in  such  a  position  is  typically 
the  leader  of  the  group  of  attacking  opponents.  It 
initiates  maneuvers,  and  others  follow  with  a  small 
time-lag.  The  group  leader  is  thus  ideal  as  a  paradig¬ 
matic  agent.  Note  that  a  dynaunic  change  in  the 
paradigmatic  agent  does  not  cause  unsharing. 

•  Cost(Unshare):  Unsharing,  however,  has  a  rather 
high  cost.  For  instance,  once  D  and  E  are  de¬ 
tected  to  have  unshared  models,  a  completely  new 
modelj^jj  is  constructed.  Here,  the  entire  statej^jj 
has  to  be  copied  to  statej^jj. 

The  end  result  is  that  a  particular  agent’s  model  is 
selectively  executed  when  the  agent  breaks  away  from 
the  coordinated  group.  Otherwise,  its  model  is  merged 
with  the  paradigmatic  agent’s  model. 

5  Sharing  in  Recursion  Hierarchy 

Models  of  a  single  agent  across  a  recursion  hierarchy 
are  likely  to  be  near-identical  to  each  other,  and  thus 
they  form  the  second  category  of  models  that  may  al¬ 
low  sharing.  We  have  so  far  limited  our  investigation 
of  sharing/unsharing  to  models  with  r  <  3,  and  specif¬ 
ically  to  different  models  of  L,  such  as  niodel£j3T  and 
model£j;£  at  r  =  3,  with  models  at  r  =  1.  Other 
models,  mcluding  those  at  deeper  levels  of  nesting  (r  > 
4)  are  never  unshared.  For  instance,  niodelj^pj^j)  is 
never  unshared  from  model^jj.  The  motivation  for 
this  restriction  is  in  part  that  in  our  interviews  with 
domain  experts,  references  to  unshared  models  at  r  >  4 
have  rarely  come  up.  In  part,  this  also  reflects  the  com¬ 
plexity  of  such  unsharing,  and  it  is  thus  an  important 
issue  to  be  addressed  in  future  work. 

To  underst2ind  the  cost-benefit  tradeoffs  of  sharing 
recursive  models,  it  is  first  useful  to  understand  how 
sharing  and  unsharing  may  actually  occur.  One  gen¬ 
eral  technique  for  accomplishing  sha^g  in  the  re¬ 
cursion  hierarchy  is  to  first  let  models  generate  its 
operator  hierarchy.  As  the  hierarchy  is  generated, 
if  model^pL  with  an  operator^  —  that  is, 

it  would  have  generated  an  identical  operator 
given  statej^jjj^  —  then  it  (modelmj^)  *Srotes”  in 
agreement.  This  ‘Vote”  indicates  that  that  par¬ 
ticular  operatorjj  from  modelj^  is  now  shared  with 
modelj^j^j;^.  This  essentially  corresponds  to  the  worst 
case  strategy  introduced  in  section  2  —  given  a  choice 
among  operators^pj^,  the  one  that  is  identical  to 
operator^  is  selected  and  shared. 

Thus,  the  detection/merging  cost  is  low,  since  this 
can  be  accomplished  without  an  extensive  comparison 
of  models.  Furthermore,  the  savings  from  model  shar¬ 
ing  are  substantial  —  as  discussed  below,  unsharing 
occurs  over  small  time  periods.  Furthermore,  the  un¬ 
sharing  cost  is  low,  since  it  does  not  involve  state  copy¬ 
ing.  Thus,  sharing  benefits  appear  to  easily  outweigh 


its  costs. 

Unsharing  actuaJly  occurs  because  of  differences  be¬ 
tween  state and  statej^j^^.  Due  to  these  differences, 
the  recursive  niodel£j)jj  cannot  generate  an  operator 
that  is  shared  with  operator  .  There  is  then  unsharing 
of  the  operator  hierarchies  in  modelj^j^T  and  modelr  , 
which  may  be  harnessed  in  service  oi  deceptive  (or 
other)  tactics.  In  the  following,  Subsection  5.1  focuses 
on  one  general  strategy  for  such  deception.  Subsection 
5.2  focuses  on  a  special  class  of  differences  between  re¬ 
cursive  states  —  caused  by  sensor  imperfections  —  and 
the  deceptive  maneuvers  possible  due  to  those  differ¬ 
ences. 


5*1  Deception 

Due  to  differences  between  statej^jjj^  and  statej^, 
modelj^jjj^  may  generate  an  operator^pL 
not  be  shared  in  the  operator^  hierarchy.  This  in¬ 
dicates  to  L  that  D  expects  L  to  be  engnaged  in  a 
different  maneuver  (operatorr  than  the  one  it  is 
actually  executing  (operator£j.  In  such  cases,  L  may 
attempt  to  deceive  D  by  abandoning  its  on-going  ma¬ 
neuver  and  “playing  along^  with  what  it  believes  to  be 
D’s  expectations. 

To  understand  this  deceptive  strategy,  consider  the 
following  case  of  L’s  deceptive  missile  firing.  Let  us 
go  back  to  the  situation  in  Figure  1-a,  although  now, 
assume  that  state^  maintains  a  secret  missile  range 
of  30  miles,  while  staten)^  maintains  the  range  to 
be  50  miles.  The  missile  range  is  noted  in  the  un¬ 
shared  portions  of  the  states  as  shown  in  Figure  3.  At 
a  range  of  50  miles  —  given  that  statej^pj^  notes  the 
missile  range  to  be  50  miles  —  modelm^  suggests 
the  execution  of  a  employ-missihjjQj^  operator.  This 
causes  unsharing  with  operators  m  modelj^.  Employ- 
missx7cjjj3£  subgoals  into  ^ef-sfecnn^-ctrc/e^p]^,  in¬ 
dicating  a  turn  to  point  at  target,  as  shown  m  Fig¬ 
ure  1-b. 

These  operators  suggest  actions  for  L  in  order 
to  deceive  its  opponent.  L  may  execute  deceptive 
operatorsjj  that  create  the  external  actions  suggested 
by  operatorr  without  actually  launching  a  mis¬ 
sile.  L  therefore  executes  a  employ-missile-deceptivej^ 
operator.  This  subgoals  into  the  get-steering- circle- 
deceptiver  operator.  This  causes  the  next  subgoal, 
of  stari-&-maintain-tumj^  in  modelj^  which  actually 
causes  L  to  turn  to  point  at  its  target,  D.  This  differ¬ 
ence  in  modelj^  and  modelj^p^  causes  some  unshar¬ 
ing  in  their  operator  hierarcmes,  as  shown  in  Figure 
4,  After  pointing  at  target,  modelr  executes  the 
fpolejj^j^  operator  —  that  is,  L  beueves  that  D  is  ex¬ 
pecting  L’s  fpole  to  support  an  actual  naissile  in  the  air. 
L  executes  fpole-decepiivej^  without  actually  firing  a 
missile.  Thus,  with  a  deceptive  maneuver,  L  convinces 
D  that  it  has  fired  a  missile  at  a  much  longer  range, 
without  actually  firing  one  —  forcing  D  to  go  on  the 
defensive  by  turning  towards  beam. 

L  can  employ  a  whole  class  of  such  deceptive  maneu- 


Figure  4:  Deceptive  missile  firing:  operator^  and 
operatorj^jjL  hierarchies  are  dynamically  unshared. 


vers  by  going  along  with  modelLpj^’s  expectation,  as 
it  did  here.  This  is  essentially  a  general  strategy  for  de¬ 
ceptive  maneuvers,  which  is  instantiated  with  particu¬ 
lar  deceptive  maneuvers  in  real-time.  Yet,  this  is  only  a 
first  step  towards  a  full-fledged  deceptive  agent.  There 
are  many  other  deceptive  techniques  and  issues  that 
remain  unresolved,  e.g.,  determining  whether  engag¬ 
ing  in  deception  would  lead  to  a  globally  sub-optimal 
behavior. 

5.2  Sensor  Imperfections 

Realistic  radar  imperfections  in  this  domain  also  lead 
to  unsharing  among  recursive  models.  It  is  useful  to 
examine  these  in  some  detail,  since  these  are  illustra¬ 
tive  of  the  types  of  differences  that  are  expected  to 
arise  in  other  domains  where  agents  have  realistic  sen¬ 
sors.  To  this  end,  it  is  useful  to  classify  the  different 
situations  resulting  from  these  imperfections  as  shown 
in  Figure  5.  As  a  simplification,  these  situations  de¬ 
scribe  L’s  perspective  as  it  interacts  with  a  single  op¬ 
ponent  for  D,  and  limited  to  r  <  3.  Figure  5-a  focuses 
on  an  agent’s  awareness  of  another’s  presence.  In  the 
figure,  Aware<BAZ>  denotes  someone’s  awareness  of 
an  agent  named  BAZ.  Furthermore,  subscript  L  indi¬ 
cates  L’s  own  situation,  a  subscript  LD  indicates  L’s 
tracking  of  D,  a  subscript  LDL  indicates  L’s  recursive 
tracking  of  D’s  tracking  of  L.  Thus,  the  first  branch 
point  in  5-a  indicates  whether  L  is  aware  of  D’s  pres¬ 
ence  (-|-Awarej^<D>)  or  unaware  (— Aware£<D>). 
If  — Awarer  <D>  then  L  can  not  truck  D’s  aware¬ 
ness.  If  -bAware£<D>,  then  L  may  believe  that  D 
is  aware  of  L’s  presence  (-l-Aware££)<L>)  or  unaware 
(~Awarej^P<L>).  If  -f“Awarej;^P<L>,  then  L  may 
have  beheis  about  D’s  beliefs  aoout  L’s  awareness: 
•f  Aware£j)]^<D>  or  — Aware£j3£<D>. 

While  an  agent  may  be  aware  of  another,  it  may 
not  have  accurate  sensor  information  of  the  other 
agent’s  actions,  specifically,  turns,  climbs  and  dives. 
For  instance,  in  Figure  1-d,  -hAware£<D>,  yet  L 
loses  radar  contact  due  to  D’s  beam.  Figure  5-b 
classifies  these  situations.  Here,  -fSense£<D>  refers 
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Figure  5:  Classifications  by:  (a)  awareness;  (b)  accu¬ 
racy  of  sensor  information. 


to  situations  where  L  believes  it  has  accurate  sen¬ 
sor  information  of  D’s  actions,  while  — Sense jj<D> 
refers  to  situations  as  in  Figure  1-d,  where  it  does 
not.  In  either  case,  L  may  believe  that  D  ei¬ 
ther  has  accurate  sensor  information  of  L’s  actions 
(+Sense£i3<L>)  or  not  (— Sense£j)<L>).  Thus, 
in  Figure  1-d,  while  -"Sen^£<D>,  L  also  believes 
D  has  lost  radar  contact  due  to  its  90^  beam 
turn  (— Sensej^j)  <L>).  Recursion  continues  with 
H-Sense£j3L<D>  and  — Sense£DL<D>. 

Based  on  the  above  classification,  L’s  perspec¬ 
tive  of  a  situation  may  be  described  as  a  six¬ 
tuple.  For  instance,  Figures  1-a  to  1-c  may 
be  described  as  (+Awarej;^<D>,  -|-Awarej^j)<L>, 
-|-Awaremj;^<D>,  -hSense£<D>,  +Sense£D<L>, 
-fSensej^j)L<D>).  This  is  the  previously  introduced 
ideal  sensor  situation,  with  awareness  and  sensor 
accuracy.  Based  on  the  six-tuple,  64  such  situations 
seem  possible.  However,  many  are  ruled  out  —  if  an 
agent  is  unaware  of  another,  it  cannot  have  accurate 
sensor  information  regarding  that  agent  —  reducing 
the  number  of  possible  situations  to  15. 

Within  these  15,  we  have  so  far  examined  unsharing 
and  deception  in  the  context  of  one  situation,  namely 
the  ideal  situation.  We  now  briefly  examine  the  un¬ 
sharing  and  deception  possible  in  the  remaining  14  sit¬ 
uations.  Among  these  14,  there  are  three  that  typi¬ 
cally  arise  in  the  initial  portions  of  the  combat  where 
L  believes  D  is  unaware  of  L  (— Aware^j)  <L>).  For 
instance,  L  may  have  seen  D  by  virtue  of  its  longer 
range  radar,  but  it  may  have  assumed  that  D  is  still  un¬ 
aware  due  to  its  shorter  range  radar:  (+Awarej^<D>, 
— Aware£j)<L>,  -‘Aware£j3j;^<D>,  +SenseT  <D>, 
— Sensej^£)<L>,  — Senser  j)j^<D>),  In  all  these 
cases,  modelj^j^T  is  nullj  and  thus  the  question  of 
sharing  with  mooelr  does  not  arise.  Suppose  as  the 
aircraft  move  even  closer,  D  engages  in  the  collision 
course  mzineuver,  which  allows  L  to  conclude  that 
+Aware]^P<L>.  Here,  there  are  two  possibilities. 
First,  if  — Awarej^jj£<D>,  i.e.,  L  believes  D  believes 
L  is  unaware  of  D,  there  is  much  greater  dissimilar¬ 


ity  between  modelr  and  model^.  Modelj^j^j^  now 
predicts  that  L  will  not  engage  in  combat  witii  D,  i.e., 
there  will  be  unshaxing  even  with  the  intercepij^  opera¬ 
tor.  Once  again,  L  may  deceive  D  by  acting  consistent 
with  modeln^L’s  expectation,  and  not  turn  towards 
D.  This  is  similar  to  the  deceptive  strategy  introduced 
in  Section  5.1.  L  may  then  wait  till  D  gets  closer  and 
then  turn  to  attack. 

The  second  possibility  is  4'AwareLD^,<D>.  In  this 
case,  we  return  to  the  ideal  situation  in  Figures  1-a 
to  1-c,  where  unsharing  is  still  possible  as  in  Figure 
4.  Furthermore,  even  with  +Aware£p£<D>,  there 
are  situations  with  — Sense££)<L>,  iraere  L  believes 
D  cannot  sense  L’s  actions.  In  such  cases,  L  may 
engage  in  deception  by  deliberately  not  acting  consis¬ 
tent  with  model^DL’s  expectations,  e.g.,  diving  when 
model^jjj^  does  not  expect  such  a  dive.  Such  delib¬ 
erate  unsharing  is  another  type  of  deceptive  strategy 
that  among  many  others,  is  one  we  have  not  examined 
in  detail  so  far. 

6  Experimental  Results 

To  understand  the  effectiveness  of  the  agent  tracking 
method  introduced  here,  we  have  implemented  an  ex¬ 
perimental  variant  of  TacAir-Soar(Tambe  ei  al  1995). 
The  original  Tac Air-Soar  system  contains  about  2000 
rules,  and  automated  pilots  based  on  it  have  par¬ 
ticipated  in  combat  exercises  with  expert  human  pi- 
lots(Tambe  ei  al  1995).  Our  experimental  version  — 
created  since  it  employs  an  experimental  agent  track¬ 
ing  technology  —  contains  about  950  of  these  rules. 
This  version  can  recursively  track  actions  of  individu¬ 
als  or  groups  of  opponents  while  xising  the  model  shar¬ 
ing  optimizations,  and  engaging  in  deception.  Proven 
techniques  from  this  experimental  version  are  trans¬ 
ferred  back  into  the  original  TacAir-Soar. 

Table  1  presents  experimental  results  for  typical  sim¬ 
ulated  air-combat  scenarios  provided  by  the  domain 
experts.  Column  1  indicates  the  number  of  opponents 
(i\0  faced  by  our  TacAir-Soar-based  agent  L.  Column 
2  indicates  whether  the  opponents  are  engaged  in  a 
coordinated  attack.  Column  3  shows  the  actual  max¬ 
imum  number  of  models  used  in  the  combat  scenar¬ 
ios  with  the  optimizations  (excluding  temporary  model 
unsharing  in  service  of  deception).  The  numbers  in 
parentheses  are  projected  number  of  models  —  2^+1 

—  without  the  model  sharing  optimization  (the  actual 
number  without  sharing  should  be  0{N^),  but  we  ex¬ 
clude  the  permanently  shared  models  from  this  count 

—  see  Section  5).  With  optimizations,  as  expected, 
the  number  of  models  is  AT+l  when  opponents  are  not 
coordinated,  and  just  two  when  the  opponents  are  co¬ 
ordinated.  Column  4  shows  the  actual  and  projected 
number  of  operator  executions.  The  projected  number 
is  calculated  assuming  2A+1  models.  Column  5  shows 
a  two  to  four  fold  reduction  (projected/actual)  in  the 
number  of  operators.  Savings  are  higher  with  coordi¬ 
nated  opponents.  L  is  usually  successful  in  real-time 
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tracking  in  that  it  is  able  to  track  opponents’  behav¬ 
iors  rapidly  enough  to  be  able  to  respond  to  them.  L 
is  unsuccessful  in  real-time  tracking  in  the  case  of  four 
uncoordinated  opponents  (with  5  models),  and  it  gets 
shot  down  (hence  fewer  total  operators  than  the  case 
of  2  opponents).  This  failure  indicates  that  our  opti¬ 
mizations  have  helped  —  without  them,  L  could  have 
failed  in  all  cases  of  2  or  4  opponents  since  they  involve 
5  or  more  projected  models.  It  also  indicates  that  L 
may  need  additional  optimizations. 


N 

Coord? 

Actual(projct) 
num  max  model 

Actual(projct) 
total  operatrs 

Reduction 

(proj/act) 

1 

- 

2(3) 

143(213) 

1.5 

2 

No 

3(5) 

176(314) 

1.8 

4 

No 

5(9) 

148(260) 

1.8 

2 

Yes 

2(5) 

109(251) 

2.3 

4 

Yes 

2(9) 

105(407) 

3.9 

Table  1:  Improvements  due  to  model  sharing. 


7  Summary 

This  paper  focused  on  real-time  recursive  tracking  of 
agents  and  agent-groups  in  dynamic,  multi-agent  envi¬ 
ronments.  Our  investigation  was  based  on  intelligent 
pilot  agents  in  a  real-world  synthetic  air-combat  envi¬ 
ronment,  already  used  in  a  large-scale  operational  mil¬ 
itary  exercise(Tambe  ei  ai  1995).  Possible  take-away 
lessons  from  this  investigation  include: 

•  Address  recursive  agent  tracking  via  a  uniform  treat¬ 
ment  of  the  generation  of  flexible/reactive  behaviors, 
as  well  as  of  tracking  and  recursive  tracking. 

♦  Alleviate  tracking  costs  via  model  sharing  —  with 
selective  unsharing  in  situations  where  models  grow 
sufficiently  dissimilar. 

♦  Track  group  activities  by  tracking  a  paradigmatic 
agent. 

•  Ex])loit  differences  in  an  agent’s  self  model  and  its 
recursive  self  model  in  service  of  deception  and  other 
actjons. 

One  key  issue  for  future  work  is  understanding  the 
broader  applicability  of  these  lessons.  To  this  end,  we 
plan  to  explore  the  relationships  of  our  approach  with 
formal  methods  for  recursive  agent  modeling(Gmy- 
trasiewicz,  Durfee,  &  Wehe  1991;  Wilks  &  Ballim 
1987).  This  may  help  gener2dize  the  tracking  approach 
introduced  in  this  paper  to  other  multi-agent  environ¬ 
ments,  including  ones  for  entertainment  or  education. 
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Abstract 

Agent  tracking  involves  monitoring  the  observ¬ 
able  actions  of  other  agents  as  well  as  infer¬ 
ring  their  unobserved  actions,  plans,  goals  and 
behaviors.  In  a  dynamic,  real-time  environ¬ 
ment,  an  intelligent  agent  faces  the  challenge 
of  tracking  other  agents’  flexible  mix  of  goal- 
driven  and  reactive  behaviors,  and  doing  so 
in  real-time,  despite  ambiguities.  This  paper 
presents  RESC  (REal-time  Situated  Commit¬ 
ments),  an  approach  that  enables  an  intelligent 
agent  to  meet  this  challenge.  RESC’s  situat¬ 
edness  derives  &om  its  constant  uninterrupted 
attention  to  the  currtni  world  situation  —  it 
always  tracks  other  agents’  on-going  actions  in 
the  context  of  this  situation.  Despite  ambigu¬ 
ities,  RESC  quickly  commits  to  a  single  inter¬ 
pretation  of  the  on-going  actions  (without  an 
extensive  examination  of  the  alternatives),  and 
uses  that  in  service  of  interpretation  of  future 
actions.  However,  should  its  commitments  lead 
to  inconsistencies  in  tracking,  it  uses  single- 
state  backtracking  to  undo  some  of  the  conunit- 
ments  and  repair  the  inconsistencies.  Together, 
RESC’s  situatedness,  immediate  commitment, 
and  single^state  backtracking  conspire  in  ptch 
viding  RESC  its  real-time  character. 

RESC  is  implemented  in  the  context  of  intelli¬ 
gent  pilot  agents  participating  in  a  real-world 
synthetic  air-combat  environment.  Experimen¬ 
tal  results  illustrating  RESC’s  effectiveness  are 
presented,^ 


agents’  observable  actions  and  inferring  their  unobserved 
actions,  plans,  goals  and  behaviors  —  is  a  key  capability 
required  to  support  such  interaction. 

This  paper  focuses  on  agent  tracking  in  real-time, 
dynamic  environments.  Our  approach  is  to  first  build 
agents  that  are  (reasonably)  successful  in  agent  tracking 
in  such  environments,  and  then  attempt  to  understand 
the  underlying  principles.  Thus,  the  approach  is  one  of 
first  building  an  “interesting”  system  for  a  complex  en¬ 
vironment,  and  then  understanding  why  it  does  or  does 
not  work  (see  [Hanks  et  c/.,  1993]  for  a  related  discus¬ 
sion).  In  step  with  this  approach,  we  are  investigating 
agent  tracking  in  the  context  of  our  on-going  effort  to 
build  intelligent  pilot  acents  for  a  real-world  synthetic 
air-combat  environmentlTambe  et  a/.,  1995].  This  envi¬ 
ronment  is  based  on  a  commercially  developed  simulator 
called  ModSAF[Calder  et  c/.,  1993],  which  has  already 
been  used  in  an  operational  military  exercise  involving 
expert  human  pilots.  For  an  illustrative  example  of  agent 
tracking  in  this  environment,  consider  the  scenario  in 
Figure  1.  It  involves  two  combating  pilot  agents  —  L  in 
the  light-shaded  aircraft  and  D  in  the  dark-shaded  one. 
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1  Introduction 

In  a  multi-agent  environment,  an  automated  agent  of¬ 
ten  needs  to  interact  intelligently  with  other  agents  to 
achieve  its  goals.  Agent  tracking  —  monitoring  other 

^We  thank  Rick  Lewis  and  Yasuo  KuniyosH  for  helpful 
feedback.  This  research  was  supported  under  subcontract  to 
the  University  of  Southern  California  Information  Sciences 
Institute  horn  the  University  of  Michigan,  as  part  of  contract 
N00014-92-K-2015  from  the  Advanced  Systems  Technology 
Office  (ASTO)  of  the  Advanced  Research  Projects  Agency 
(ARPA)  and  the  Naval  Research  Laboratory  (NRL). 


Figure  1:  A  simulated  air-combat  scenario.  An  arc  on 
an  aircraft’s  nose  shows  its  turn  direction. 

Initially,  L  and  D’s  aircraft  are  50  miles  apart,  so  they 
can  only  see  each  other’s  actions  on  radar.  For  effective 
performance,  they  have  to  continually  track  these  ac¬ 
tions.  Indeed,  D  is  able  to  survive  a  missile  attack  by  L 
in  this  scenario  due  to  such  tracking,  despite  the  missile 
being  invisible  to  D’s  radar.  In  particular,  in  Figure  1- 
a,  D  observes  L  turning  its  aircraft  to  a  collision-course 
heading  (i.e.,  at  this  heading,  L  will  collide  with  D  at 


the  point  shown  by  x).  Since  this  heading  is  often  used 
to  reach  one’s  missile  firing  range,  D  infers  the  possibil¬ 
ity  that  L  is  trying  to  reach  this  range  to  fire  a  missile. 
In  Figure  1-b,  D  turns  its  aircraft  15®  right.  L  reacts 
by  turning  15®  left,  to  maintain  collision  course.  In  Fig¬ 
ure  1-c,  L  reaches  its  missile  range,  points  its  aircraft 
at  D’s  aircraft  and  fires  a  radar-guided  missile.  While 
D  cannot  see  the  missile  on  its  radar,  it  observes  L’s 
turn,  and  infers  it  to  be  part  of  L’s  missile  firing  behav¬ 
ior.  Subsequently,  D  observes  L  executing  a  35®  turn 
away  from  its  aircraft  (Figure  1-d).  D  infers  this  to  be 
an  fpole  tmrn,  typically  executed  after  firing  a  missile  to 
provide  radar  guidance  to  the  missile,  while  slowing  the 
closure  between  the  two  aircraft.  While  D  still  cannot 
observe  the  missile,  it  is  now  sufficiently  convinced  to 
attempt  to  evade  the  missile  by  turning  90®  relative  to 
the  direction  of  L’s  aircraft  (Figure  1-e).  This  beam  turn 
causes  D’s  aircraft  to  become  invisible  to  L’s  (doppler) 
radar«  Deprived  of  radar  guidance,  L’s  missile  is  ren¬ 
dered  harmless. 

Meanwhile,  L  tracks  D’s  beam  turn  in  Figure  1-e,  and 
prepares  counter-measures  in  anticipation  of  the  likely 
loss  of  both  its  missile  and  radar  contact. 

Thus,  the  pilot  agents  need  to  continually  track  their 
opponents’  actions,  such  as  turns,  and  infer  unobserved 
actions,  high-level  goals  axxd  behaviors,  such  as  the  fpole, 
beam  or  missile  firing  behaviors.  This  agent  tracking  ca¬ 
pability  is  related  to  plan-recognition[Kautz  and  Allen, 
1986;  Azarewicz  ei  aL,  1986].  The  key  difference  is  that 
plan-recognition  efforts  typically  focus  on  tracking  a  nar¬ 
rower  (plan-based)  class  of  agent  behaviors,  as  seen  in 
static,  single-agent  domains.  In  particular,  they  assume 
that  agents  rigidly  follow  plans  step-by-step.  In  contrast, 
agent  tracking  involves  the  novel  challenge  of  tracking  a 
broader  mix  of  goal-driven  and  reactive  behaviors.  This 
capability  is  important  for  dynamic  environments  such 
as  air-combat  simulation  where  agents  do  not  rigidly  fol¬ 
low  plans  —  as  just  seen,  pilot  agents  continusJly  react 
to  each  other’s  maneuvers. 

Agent  tracking  and  plan  recognition  are  both  part  of 
a  larger  family  of  comprehension  capabilities  that  enable 
an  agent  to  parse  a  continuous  stream  of  input  from  its 
environment,  whether  it  be  in  the  form  of  natural  lan¬ 
guage  or  speech  or  music  or  simulated  radar  input,  as 
is  the  case  here  (e.g.,  see  [Rich  and  Knight,  1990,  chap¬ 
ter  14]).  Resolving  ambiguities  in  the  input  stream  is 
a  key  problem  when  parsing  all  of  these  different  types 
of  input.  One  example  of  the  ambiguity  faced  in  agent 
tracking  can  be  seen  in  L’s  turn  in  Figure  1-c.  From  D’s 
perspective,  L  could  be  turning  to  fire  a  missile.  Alter¬ 
natively,  L  could  be  beginning  a  180®  turn  to  run  away 
from  combat.  Or  L  could  simply  be  following  its  flight 
plan,  particularly  if  it  has  a  much  shorter  radar  range, 
and  thus  is  likely  unaware  of  D.  Despite  such  ambigui¬ 
ties,  D  has  to  track  L’s  actions  with  sufficient  accuracy 
so  as  to  respond  appropriately.  The  novel  challenge  in 
this  domain  —  at  least  with  respect  to  previous  work  in 
plan  recognition  —  is  that  the  ambiguity  resolution  has 
to  occur  in  real-time.  As  the  world  rapidly  moves  on,  an 
agent  cannot  lag  behind  in  tracking.  Thus,  if  D  is  late  or 
inaccurate  in  its  tracking  of  L’s  missile  firing  maneuvers 


in  Figure  1-c,  it  may  not  evade  the  missile  in  time. 

This  paper  describes  an  approach  called  RESC  (REal- 
time  Situated  Commitments)  for  agent  tracking  that  ad¬ 
dresses  the  above  challenges.  RESC’s  situatedness  rests 
on  its  constant  attention  to  the  current  world  situation, 
and  its  tracking  of  other  agents’  actions  in  the  context  of 
this  situation.  Despite  its  situatedness,  RESC  does  make 
some  commitments  about  the  other  agent’s  unobserv¬ 
able  actions,  behaviors  and  goals,  and  attempts  to  use 
those  in  tracking  the  agent’s  future  actions.  In  ambigu¬ 
ous  situations,  these  commitments  could  be  inappropri¬ 
ate  and  could  lead  to  failures  in  tracking  —  in  such  cases, 
RESC  modifies  them  on-line,  without  re-examining  past 
world  states.  Together,  RESC’s  situatedness,  immediate 
commitments  (despite  the  ambiguities),  and  its  on-line 
modification  of  commitments  provide  RJ5SC  its  real-time 
character. 

In  the  following,  we  first  describe  the  process  that 
RESC  employs  for  tracking  other  agent’s  flexible  and 
reactive  behaviors  (Section  2).  This  process  enables 
RESC  to  be  situated  in  its  present  as  it  tracks  an 
agent’s  actions.  Subsequently,  RESC’s  ambiguity  res¬ 
olution  and  real-time  properties  are  described  in  Section 
3.  These  descriptions  are  provided  in  concrete  terms, 
using  an  implementation  of  the  pilot  agents  in  a  system 
called  TacAir-Soar[Tambe  ei  aly  1995J,  built  using  the 
Soar  architecture  [Newell,  1990;  Rosenbloom  ei  aL,  1991]. 
We  assume  some  familiarity  with  Soar’s  problem-solving 
model,  which  involves  applying  operators  to  states  to 
reach  a  desired  state. 

2  Tracking  Flexible  Goal-driven  and 
Reactive  Behaviors 

In  an  environment  such  as  air-combat  simulation,  agents 
possess  similar  behavioral  flexibility  and  reactivity. 
Thus,  the  (architectural)  mechanisms  that  an  agent  em¬ 
ploys  in  generating  its  own  behaviors  may  be  used  for 
tracking  others’  flexible  and  reactive  behaviors.  Con¬ 
sider,  for  instance,  D’s  tracking  of  L’s  behaviors  in  Fig¬ 
ure  1-c.  D  generates  its  own  behavior  using  the  operator 
hierarchy  shown  in  Figure  2-a.  (The  solid  lines  indicate 
the  actual  hierarchy,  and  the  dashed  lines  indicate  unse¬ 
lected  options.)  Here,  at  the  top-level,  D  is  executing  its 
mission  —  to  protect  its  home-base  for  a  given  time  pe¬ 
riod  —  via  the  execuie-mission  operator.  Since  the  ter¬ 
mination  condition  of  this  operator  —  completion  of  D’s 
mission  —  is  not  yet  achieved,  a  subgoal  is  generated.^ 
D  rejects  options  such  as  foUow-flighUpIan  and  run-away 
in  this  subgoal  in  favor  of  the  iniercepi  operator,  so  as 
to  combat  L.  In  service  of  iniercepiy  D  selects  employ- 
missile  in  the  next  subgoal.  However,  since  D  has  not 
reached  its  missile  firing  range  and  position,  it  selects 
gei-firing-posiiion  in  the  next  subgoal.  Skipping  to  the 
final  subgoal,  mainiain-heading  enables  D  to  msdntain 

^  A  Soar  operator  has  termination  conditions  —  if  the  op¬ 
erator’s  application  (or  new  sensor  input)  changes  the  state 
so  as  to  satisfy  the  termination  conditions,  then  that  operar 
tor  and  all  of  its  subgoals  are  terminated.  If  the  termination 
conditions  remain  unsatisfied,  a  subgoal  is  created,  within 
which  new  oi>erators  are  applied. 
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its  heading,  as  seen  in  Figure  1-c. 


(a)  Operator^  Hierarchy 


(b)  Operator^  Hierarchy 


Figure  2:  Operator  hierarchies:  Solid  lines  indicate  ac¬ 
tual  selections;  dashed  indicate  unselected  options. 

The  operators  used  for  generating  D^s  own  actions, 
such  as  in  Figure  2-a,  will  be  denoted  with  the  subscript 
D,  e.g.,  iniercepijy.  Operator jj  will  denote  an  arbitrary 
operator  of  D.  Statej^  will  denote  the  global  state  shared 
by  all  of  these  operators.  It  maintains  all  of  the  dynamic 
sensor  input  regarding  D’s  own  aircraft,  such  as  its  head¬ 
ing  and  altitude.  It  also  maintedns  dynamic  radar  input 
reg2Lrding  L’s  aircraft,  such  as  heading,  range,  collision 
course  and  other  geometric  relationships.  Additionally, 
it  maintains  non-sensor  information,  e.g.,  D^s  missile  ca¬ 
pabilities.  Together,  statej)  and  the  operatorjj  hierar¬ 
chy  constitute  the  introspectable  aspect  of  D,  and  in  this 
sense  may  be  considered  as  D’s  model  of  its  present  self, 
referred  to  as  model j). 

Model j)  supports  D’s  flexible/reactive  behaviors  via 
its  embedding  within  Soar;  and  in  particular,  via  two 
of  Soar’s  architectural  features:  (i)  a  decision  proce¬ 
dure  that  supports  flexibility  by  integrating  all  available 
knowledge  about  absolute  or  relative  worth  of  candidate 
operators  right  before  deciding  to  commit  to  a  single  op¬ 
erator;  (ii)  termination  conditions  for  operators  that  sup¬ 


port  reactivity  by  terminating  operators  in  response  to 
the  given  situation[Rosenbloom  ei  a/.,  1991].  The  point 
here  is  not  that  these  specific  architectural  features  are 
the  only  way  to  yield  such  behavior,  but  rather  that  there 
are  sudi  features,  and  that  they  can  be  rexised  in  track¬ 
ing  other  agents’  behaviors.  To  illustrate  this  re-use,  we 
assume  for  now  that  D  and  L  possess  an  identical  set 
of  maneuvers.  (Note  that  this  sameness  of  maneuvers  is 
not  necessary;  all  that  is  required  is  for  D  to  have  an 
accurate  model  of  its  opponent’s  msineuvers.) 

Thus,  D  uses  a  hierarchy  such  as  the  one  in  Figure  2- 
b  to  track  L’s  behaviors.  Here,  the  hierarchy  (the  solid 
lines  in  Figure  2-b)  represents  D’s  model  of  L’s  current 
operators  in  the  situation  of  Figure  1-c.  These  operators 
are  denoted  with  the  subscript  DL.  This  operatorjj^ 
hierarchy,  and  the  statepj^  that  goes  with  it,  constitute 
D’smoddofLor  modelQ£.  Within  modeljjj^,  execute- 
missionj^j^  denotes  the  operator  that  D  uses  to  track 
L’s  mission  execution.  Since  L’s  mission  is  not  yet  com¬ 
plete,  D  applies  the  iniercepijyj^  operator  in  the  subgoal 
to  track  L’s  intercept.  The  unselected  alternatives  here, 
e.g.,  run-awayjyj^,  indicate  the  ambiguity  in  tracking  L’s 
actions  (however,  assume  for  now  that  this  is  accurately 
resolved).  In  the  next  subgoal,  employ-missilei^j^  is  ap¬ 
plied.  Since  L  has  reached  its  missile  firing  position,  in 
the  next  two  subgoals,  final-missile-maneuverj^j^  tracks 
L’s  final  missile  maneuver,  and  point-ai-iargeijyj^  tracks 
L’s  turning  to  point  at  D.  In  the  final  subgoal,  D  ap¬ 
plies  the  siari-&-mainiain-iumj^j^  operator  to  statejj^, 
which  does  not  (can  not)  actually  cause  L  turn.  Instead, 
this  operator  predicts  L’s  action  and  matches  the  pre¬ 
diction  against  L’s  actual  action.  Thus,  if  L  starts  turn¬ 
ing  to  point  at  D’s  aircraft,  then  there  is  a  match  with 
modeljjj^’s  predictions  —  D  believes  L  is  turning  to 
point  at  its  target,  D,  to  fire  a  missile.  When  L’s  aircraft 
turns  sufficiently  to  point  straight  at  D’s  edreraft  (Figure 
1-c),  the  termination  condition  of  the  poini-ai-iargeijyj^ 
operator  is  satisfied,  and  it  is  terminated.  A  new  oper¬ 
ator,  push-fire-butionjyj^^  is  then  applied  in  the  subgoal 
of  final-mtssile-maneuverjyj^.  This  operator  predicts  a 
missile  firing,  although  the  missile  cannot  actually  be  ob¬ 
served.  StatepT  maintains  a  representation  of  the  mis- 
sUe,  and  mar£  it  with  a  low  likelihood.  Following  that, 
the  fpole-Tightpj^  operator  predicts  L’s  right  turn  for  an 
fpole.  When  tms  prediction  is  matched  with  L’s  turn  in 
Figure  1-d,  the  missile’s  likelihood  is  changed  to  high. 
D  now  attempts  to  evade  the  missile,  with  heam-rightjy. 
(D  currently  chooses  arbitrarily  between  the  execution 
of  operator^  and  operatorpr  ,  as  it  generates  its  own 
actions,  whUe  also  tracking  L^  actions.) 

The  above  agent  tracking  process  is  related  to  pre¬ 
vious  work  on  model  tracing  in  intelligent  tutoring  sys- 
tems(ITS)  for  tracking  student  actions[Anderson  et  a/., 
1990;  Ward,  1991].  However,  that  work  has  primarily 
focused  on  static  environments.  A  recently  developed 
ITS,  REACT[Hill  and  Johnson,  1994],  extends  model 
tracing  to  a  more  dynamic  environment,  REACT  relies 
upon  a  plan-driven  tracking  strategy,  and  deals  with  the 
more  dynamic  aspects  of  the  domain  as  special  cases.  It 
specifically  abstracts  away  from  tracking  students’  men¬ 
tal  states.  In  contrast,  pilots  appear  to  track  their  op- 
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ponents’  behaviors  in  more  detail.  Such  tracking  is  sup¬ 
ported  here  via  a  uniform  apparatus  for  the  generation  of 
an  agent’s  own  flexible/reactive  behaviors  and  tracking 
other  agents’  behaviors.  In  particular,  operatorj^  auad 
operatorjjj^  are  selected  and  terminated  in  the  same 
flexible  manner.  Thus,  as  state changes,  which  it 
does  in  reflecting  the  changing  world  situation,  new 
operatorsjjj^  may  get  selected  in  response.  This  is  key 
to  RESC’s  situatedness  —  L’s  on-going  actions  are  con¬ 
tinually  tracked  in  the  context  of  the  current  state 
For  further  details  on  this  tracking  technique,  please 
see[Tambe  and  Rosenbloom,  1995]. 

3  Real-time  Ambiguity  Resolution 

Ambiguity  manifests  itself  in  two  forms  in  the  agent 
tracking  process  introduced  in  the  previous  section.  One 
form  involves  the  alternative  operators  available  for 
tracking  the  other  agent’s  actions,  as  seen  in  the  dashed 
boxes  in  Figure  2-b.  Given  these  alternatives,  it  is  diffi¬ 
cult  to  make  accurate  selections  of  operator such 
that  their  predictions  successfully  match  L’s  actions. 
Should  an  operatorjjr  selection  be  inaLCcurate,  in  typi¬ 
cally  results  in  a  mat^  failure  (if  not  immediately,  then 
in  some  further  operatorjjj^  application).  Thus,  in  Fig¬ 
ure  2-b,  the  operatorjjL  hierarchy  predicts  L  will  turn 
to  point  at  D’s  aircraft.  Suppose  this  prediction  is  in¬ 
accurate,  and  L  turns  in  the  opposite  direction.  This 
difference  in  the  anticipated  and  actual  action  leads  to  a 
match  failure,  indicating  an  inaccuracy  in  tracking.  Sim¬ 
ilar  match  failures  can  also  occur  if  L  fails  to  begin  (or 
stop)  turning  or  maintsdn  heading  as  anticipated. 

A  second  form  of  ambiguity  is  seen  in  statej^^. 
Statej)^  needs  to  m^tain  the  same  types  of  informa¬ 
tion  as  are  in  statej^.  Here,  there  is  ambiguity  related  to 
both  the  dynamic  sensor  information  and  the  static  non¬ 
sensor  information.  With  respect  to  static  information, 
there  are  ambiguities  about  L’s  radar  and  missile  capa¬ 
bilities.  Even  if  these  are  resolved,  there  are  ambiguities 
about  dynamic  information,  such  as  whether  L  has  de¬ 
tected  D  on  radar.  For  instance,  in  Figure  1-a,  based 
on  the  static  radeir  range  information,  D  assmnes  it  has 
arrived  within  L’s  radar  range;  but  L  may  or  may  not 
have  detected  D,  depending  on  its  radar’s  orientation. 
Such  ambiguities  in  statejjj^  are  intimately  connected 
to  ambiguities  in  operator since  the  operatorjjr  hi¬ 
erarchy  is  dependent  on  the  current  statejjj^.  *rnus, 
if  D  assumes  it  is  not  detected  on  L’s  radar,  then  the 
iniercepij^j^  operator  is  ruled  out,  since  there  is  nothing 
for  L  to  mtercept.  In  contrast,  if  D  assumes  L  has  de¬ 
tected  it,  then  iniercepijyj^  is  a  likely  possibility.  A  sub¬ 
goal  of  iniercepi^T  predicts  L’s  turn  to  collision  course, 
which  is  matched  by  L’s  turn  in  Figure  1-a  —  D  now  be¬ 
lieves  L  has  detected  it,  and  L  is  going  to  collision  course 
to  intercept.  Note  that,  if  D  believes  that  L  has  detected 
it,  statejjj^  needs  to  maintain  the  various  dynamic  in¬ 
puts  that  D  believes  L  obtains  from  its  radar  regarding 
D’s  heading,  range,  geometric  relationships  etc.  Fortu¬ 
nately,  mEiny  of  these  quantities  are  symmetric  and  can 
be  reused  from  corresponding  quantities  in  statejj . 

It  is  difficult  to  resolve  the  above  ambiguities  using 
methods  that  have  been  previously  suggested  in  the 


model  tracing  literature.  Ward[Ward,  1991]  notes  that 
previous  model  tracing  systems  have  mostly  relied  on 
communication  with  the  modeled  agent  to  resolve  am¬ 
biguities.  In  air-combat  simulation,  such  communica¬ 
tion  with  enemy  pilots  is  clearly  ruled  out.  Ward’s  solu¬ 
tion  in  the  absence  of  such  information  is  am  exhaustive 
backtrack  search  of  all  of  the  different  alternatives.  In 
the  example  in  2-b,  this  involves  an  attempt  to  execute 
and  match  other  operator  hierarchies  —  generated  by  al¬ 
ternatives  such  as  run-awayY)j^  or  follow^flighUplanj^j^ 
—  via  a  systematic  backtrack  search  before  committing 
to  iniercepij^j^.  Unfortunately,  this  search-then-commit 
approach  will  very  likely  cause  tracking  to  lag  far  be¬ 
hind  the  rapidly  changing  world,  precluding  D  from  re¬ 
sponding  to  L’s  maneuvers  in  real-time.  Furthermore, 
given  the  volume  of  dynamic  information  on  statejjj^, 
the  maintenance  of  mrdtiple  old  copies  of  statejjj^  for 
backtracking  could  itself  consume  non-trivial  amounts 
of  sp^lce  and  time.  Parallel  real-time  search  of  alterna¬ 
tive  modebpr  could  eliminate  the  backtracking;  how¬ 
ever,  we  wm  Tocus  on  a  sequential  solution  given  the 
implementation  technology  available  to  us.  Further¬ 
more,  parallelism  may  not  be  adequate  when  faced  with 
the  expected  combinatorics  in  the  number  of  alterna¬ 
tives.  Borrowing  ambiguity  resolution  methods  from  the 
plan  recognition  literature  would  be  yet  another  possibil¬ 
ity;  but  the  computational  costs  (intractability)  of  tech¬ 
niques  such  as  automated  deduction[Kautz  and  Allen, 
1986]  are  a  significmt  concern. 

So  instead,  we  propose  a  new  approach  called  RESC 
(REal-time  Situated  Commitments)  that  addresses  the 
above  concerns.  As  seen  earlier,  RESC’s  situatedness 
arises  from  its  tracking  of  L’s  on-going  actions  and  be¬ 
haviors  in  the  context  of  the  current  statejjr  .  RESC’s 
commitment  is  to  a  single  modeljj^,  witn  a  single 
statepL  that  records  the  on-going  world  situation  in 
real-time  and  a  single  operatorjjjj  hierarchy,  that  pro¬ 
vides  an  on-going  interpretation  of  an  opponent’s  ac¬ 
tions.  Given  the  intense  real-time  pressure,  RESC  does 
not  spend  time  trying  to  match  alternatives;  instead,  it 
just  commits  to  a  single  operatorj^^  hierarchy,  and  any 
facts  inferred  in  statejjj^  due  to  this  hierarchy.  It  then 
tries  to  use  these  conumtments  as  context  for  tracking 
L’s  future  actions.  However,  in  some  cases,  the  commit¬ 
ments  may  get  withdrawn  given  RESC’s  situatedness  — 
as  statej^  changes,  it  may  satisfy  the  termination  con¬ 
ditions  of  an  operatorjjL  thus  cause  it,  and  all  of 
its  subgoals,  to  terminate. 

When  faced  with  ambiguity,  it  is  possible  that  RESC 
commits  to  an  inaccurate  operator and  statejjT , 
leading  to  a  match  failure.  RESC  recovers  from  sucn 
failures  by  relying  on  a  method  called  single-siate  back- 
tracking,  that  undoes  some  of  its  commitments,  result¬ 
ing  in  the  generation  of  new  operator  hierarchies,  in 
real-time.  Of  course,  if  RESC  makes  more  intelligent 
commitments  in  the  first  place  —  by  reducing  the  am¬ 
biguity  in  the  situation  with  which  it  is  faced  —  there 
will  be  less  of  a  need  for  undoing  its  commitments.  Sub¬ 
section  3.1  first  describes  strategies  —  some  general  and 
some  domain  specific  —  used  for  reducing  ambiguities 
in  both  state and  operator Subsection  3.2  then 


62 


describes  single-state  backtracking. 

3.1  Reducing  Ambiguities 

There  are  two  classes  of  strategies  used  in  RESC  to  re¬ 
solve  ambiguities:  active  and  passive.  The  active  strate¬ 
gies  rely  upon  an  agent’s  active  participation  in  its  en¬ 
vironment  to  gather  information  to  resolve  ambiguities. 
In  particular,  an  automated  pilot,  such  as  D,  can  act  in 
its  environment  and  force  its  opponent  L  to  react  and 
provide  disambiguating  information.  Consider  again  the 
example  in  Figure  1-a.  As  discussed  earlier,  D  faces  am¬ 
biguity  in  statejjT  about  whether  L’s  radar  has  detected 
D.  This  gets  resolved  with  L’s  turn  to  collision  course. 
Unfortunately,  if  L  just  happens  to  be  on  collision  course, 
it  may  not  turn  any  further,  and  the  ambiguity  would 
be  more  difficult  to  resolve.  In  such  cases,  D  can  ran¬ 
domly  turn  15-20^,  as  shown  in  Figure  1-b,  causing  L 
to  react  if  it  wishes  to  maintain  collision  course.  This 
provides  D  sufficient  disambiguating  information  —  L’s 
radar  has  detected  D.  Unfortunately,  D’s  actions  in  ser¬ 
vice  of  active  ambiguity  resolution  may  interfere  with 
its  other  goals,  such  as  firing  a  missile  at  L.  In  general, 
such  interference  is  difficult  to  resolve.  Therefore,  cur¬ 
rently,  active  ambiguity  resolution  is  based  on  a  fixed  set 
of  known  maneuvers  (supplied  by  human  experts). 

In  contrast,  passive  ambiguity  resolution  strategies 
rely  on  existing  information  to  resolve  ambiguities.  One 
key  piece  of  information  is  that  in  this  hostile  environ¬ 
ment,  an  opponent  is  likely  to  engage  in  the  most  harm¬ 
ful  maneuver.  This  information  is  used  in  the  form  of 
a  worst  case  strategy  for  disambiguation.  Thus,  given 
a  choice,  D  always  selects  the  worst-case  operatorj^^ 
(from  its  own  perspective)  while  tracking  L’s  actions. 
For  instance,  if  there  is  ambiguity  between  run-awayjyj^ 
or  iniercepijyr^  D  will  select  iniercepij^j^,  which  is  more 
harmful.  Similarly,  D  resolves  ambiguity  in  the  static  in¬ 
formation  in  statejj£  via  the  worst-case  strategy,  e.g.,  it 
assumes  that  L’s  aircraft  is  carr3ring  the  most  powerful 
missiles  and  radsir  that  it  can  carry.  Unfortunately,  this 
worst-case  strategy  can  lead  to  overly  pessimistic  behav¬ 
ior.  In  the  absolute  worst-case,  the  only  option  for  D  is 
to  run  away.  Therefore,  D  applies  it  selectively,  typically 
in  cases  where  it  has  to  disambiguate  rapidly,  and  yet  no 
other  means  are  available.  Thus,  as  seen  above,  D  does 
not  automatically  assume  detection  by  L’s  radar,  even 
though  that  would  be  the  worst-case  assumption. 

A  second  passive  ambiguity  resolution  strategy  is  test 
incorporaiion[Bennett  and  Dietterich,  1986].  The  key 
idea  is  to  generate  fewer  incorrect  alternatives  in  ambigu¬ 
ous  situations.  In  particular,  modelpj^  generates  alter¬ 
native  operatorsjj£  that  are  tested  by  matching  against 
L’s  actual  actions.  Observations  regarding  these  actions 
can  be  used  to  avoid  generating  alternatives  that  are 
guaranteed  to  lead  to  match  failures.  For  instance,  in 
Figure  1-d,  fpole-rigkijyj^  and  fpole-leftiyi^ 
ternatives  available  to  D  in  tracking  L’s  actions.  If  D 
2Jready  sees  L  turning  to  its  right,  then  can 

be  eliminated,  since  it  would  be  guaranteed  to  lead  to  a 
match  failure.  Test  incorporation  relies  on  such  spatial 
relationships. 

A  third  passive  ambiguity  resolution  strategy  is  goal 


incorporation  (e.g,,  see  [Van  Beek  and  Cohen,  1991]). 
The  key  idea  here  is  to  resolve  ambiguities  only  to  the  ex¬ 
tent  necessitated  by  an  agent’s  goals.  For  example,  given 
the  reality  of  the  simulation  environment,  L’s  aircraft  of¬ 
ten  unintentionally  deviates  from  its  intended  heading. 
Given  such  deviations,  L  sometimes  makes  corrections 
to  its  headings.  However,  D  does  not  really  need  to 
track  and  disambiguate  these  small  deviations  and  cor¬ 
rective  actions.  It  therefore  uses  fuzz-box  filters  that  dis¬ 
regard  specified  deviations  in  L’s  actions.  For  instance, 
for  poinUai-iarget’pj^,  which  tracks  L’s  pointing  maneu¬ 
ver  (Figure  1-c),  the  fuzz-box  filter  disregards  5^  of  de¬ 
viation  in  L’s  heading.  Such  filtering  also  helps  to  avoid 
tracking  of  detailed  aspects  of  state and  avoids  am¬ 
biguities  there. 

3.2  Single-State  Backtracking  in  RESC 

Based  on  the  above  disambiguation  strategies,  RESC 
commits  to  a  single  state and  a  single  operator  hi¬ 
erarchy,  which  track  L’s  actions  as  described  in  Section  2. 
However,  should  this  cause  a  match  failure,  single-state 
backtracking  is  used  to  undo  some  commitments.  As  its 
name  suggests,  this  backtracking  takes  place  within  the 
context  of  a  single  statej)^.  Starting  from  the  bottom 
of  the  operator hierarchy,  operators  are  terminated 
one  by  one  in  an  attempt  to  get  dtematives  to  take  their 
place.  Some  alternatives  do  get  installed  in  the  hierar¬ 
chy,  and  possibly  change  statej^r ,  but  lead  to  match 
failures.  These  are  replaced^  until  some  alterna¬ 
tive  leads  to  an  operatorj^j^  hierarchy  that  culminates 
in  match  success.^ 

Why  is  this  process  real-time?  The  main  reason  is 
that  backtracking  occurs  without  a  re-examination  of 
past  sensor  input  or  mental  recreation  of  older  statesjjj^. 
In  particular,  while  backtrack  search  would  normally  m- 
volve  revisiting  old  statesD^  stnd  reconsidering  the  dif¬ 
ferent  operators possible  in  eatch  of  those  states  — 
creating  an  opening  for  combinatorics  —  RESC  com¬ 
pletely  avoids  such  computation.  Furthermore,  although 
RESC  does  backtrack  over  the  operator  hierarchy,  there 
are  three  factors  that  ameliorate  the  combinatorics  there. 
First,  given  RESC’s  situatedness,  backtracking  remains 
tied  to  the  present  statep|j.  Thus,  while  a  match  fsdl- 
ure  is  recognized  and  the  backtrack  process  begun,  L  and 
D’s  aircraft  continue  to  move  and  turn,  changing  their 
speeds,  headings,  altitudes,  and  relative  geometric  rela¬ 
tionships  (e.g.,  range,  collision  course,  etc).  Statepj^ 
is  continuously  updated  with  this  latest  information. 
The  backtracking  process  takes  place  in  the  context  of 
this  continuously  hanging  state.  Thus,  only  those  al¬ 
ternative  operatorspL  ^^^.t  are  relevant  to  the  current 
statepL  get  applied.  Similarly,  in  some  cases,  changes 
in  statepj^  cause  portions  of  the  operatorpL  bierar^y 
to  terminate  automatically  during  the  backtrack  process. 
In  other  words,  RESC  is  continuously  dragged  forward 
as  the  world  changes.  Second,  RESC  does  not  oblige 
D  to  address  the  match  failure  before  D  can  execute 

^In  a  few  cases,  there  are  pending  changes  related  to  ambi¬ 
guities  in  statej)L,  e.g.,  has  L  detected  D?  These  are  applied 
first,  hoping  they  cause  changes  to  operatorj^]^  and  lead  to 
success. 
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any  of  its  own  operatorsj^.  Thus  D  is  free  to  act  to 
the  extent  it  can.  Finally,  indeed,  if  the  world  were  to 
magically  become  static,  RESC’s  strategy  will  result  in 
a  complex  search,  although  still  within  the  context  of  a 
single  state£)j^.  However,  it  is  unclear  if  this  is  necessar¬ 
ily  problematic  —  a  static  world  should  possibly  merit 
a  more  thorough  search. 

Let  us  consider  some  examples  of  single-state  back¬ 
tracking.  As  a  simple  example,  suppose  D  has  com- 
mited  to  the  modelpj^  in  Fi^re  2-b.  Initially,  poinUaU 
has  match  success  in  that,  as  predicted,  L  in¬ 
deed  starts  turning  towards  D(see  Figure  3-a  for  an  illus¬ 
tration).  However,  L  really  has  decided  to  run  away;  so 
it  continues  turning  180^  without  stopping  when  point¬ 
ing  at  D  (Figure  3-b).  This  leads  to  a  match  failure 
in  the  operatorjjj^  hierarchy.  Single-state  backtracking 
now  ensues,  termmating  operators  beginning  from  the 
bottom  of  the  hierarchy.  Finally,  iniercepiiyi^  is  termi¬ 
nated  and  replaced  by  run-awayjyj^.  This  predicts  L 
to  be  turning  towards  its  home-base,  which  successfully 
matches  L’s  actions  (Figure  3-c).  Thus,  D  successfully 
applies  run-awappyj^,  predicting  and  matching  L^s  ac¬ 
tions,  without  mentally  recreating  the  statejj  in  which 
L  may  have  initiated  its  run-away  maneuver. 
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Figure  3:  L  continues  to  turn  to  run  away. 

A  slightly  more  complex  example  involves  situations 
where  L  is  engaging  in  a  beam  maneuver.  Here,  D  ini¬ 
tially  matches  Jpole-^rightj^jr,  and  even  infers  L’s  missile 
firing,  as  part  of  statcpr  .  However,  as  L  keeps  turning, 
there  is  soon  a  match  i^ure,  causing  D  to  backtrack 
until  team-righij^  successfully  matches.  There  are  two 
key  points  here.  First,  again  D  is  successful  in  applying 
beam-righij^l,  without  mentally  recreating  the  statepj^ 
in  which  L  may  have  initiated  its  beam  maneuver.  Sec¬ 
ond,  D’s  earlier  inference  of  L’s  missile  firing  is  not  re¬ 
moved,  even  though  it  is  based  on  a  sequence  of  operators 
that  eventually  led  to  a  match  failure.  This  is  because  it 
is  difficult  for  D  to  decide  if  L  was  initially  maneuvering 
to  fire  a  missile  and  then  switched  to  beam,  or  if  it  was 
always  engaged  in  beam.  Not  knowing  any  better,  D 
does  not  eliminate  the  earlier  inference  from  statejjj^. 
Fortunately,  when  aircraft  turn  90®  to  beam,  they  can¬ 
not  provide  radar  guidance  to  their  missiles.  Therefore, 
with  L’s  beam,  D  infers  that  the  missile  that  it  ear¬ 
lier  inferred  on  statejjj^  has  lost  guidance  and  become 
harmless.  The  end  result  is  identical  to  a  case  where 
D  had  successfully  tracked  L’s  beam  maneuver,  with¬ 
out  the  failed  intermediate  inference  of  an  fpole-rzffAij^j^ 
maneuver. 

We  have  so  far  found  RESC’s  single-state  backtrack¬ 
ing  to  be  successful  in  the  air-combat  simulation  domain 
(see  Section  4).  Given  the  potential  application  of  this 
approach  for  other  areas  of  real-time  comprehension,  it 


is  useful  to  analyze  the  reasons  behind  its  success.  To¬ 
wards  this  end,  consider  first  the  following  simplified  and 
abstract  characterization  of  a  successful  application  of 
single-state  backtracking:  L  initiates  some  maneuver  0 
at  time  TO-  However,  at  TO,  D  attempts  to  match  it  by 
applying  an  operator  to  state^^j^  (which  denotes 

statej3j^  at  time  TO)-  At  time  T/H-r,  in  state^^^^  D 
recognizes  a  match  failure  with  ccjyij-  It  backtracks  and 
applies  to  state^^jJ^.  The  key  observation  here  is 
that  despite  the  time  delay  r  and  the  intervening  ap¬ 
plication  of  0Djj  is  successful  in  predicting  and 

matching  L’s  maneuvers,  as  though  it  were  applied  to 
state^^£.  Based  on  this  observation,  in  terms  of  opera¬ 
tor  preconditions  and  effects,  we  can  infer  at  least  three 
requirements  that  need  to  be  met  for  single-state  back¬ 
tracking  to  work.  In  the  following,  we  list  these  require¬ 
ments,  and  illustrate  how  pilot  agents  currently  a^ere 
to  them: 

1.  The  preconditions  are  satisfied  instate^j^^ 

in  mack  the  same  way  as  in  state^^j^:  For  pilot 
agents,  operator  preconditions  are  expressed  so  they 
do  not  test  specific  positions  of  aircraft,  but  rather 
abstracted  features  of  statej^j^  —  similar  to  the 
fuzz-boxes  in  Section  3.1  —  that  are  unlikely  to 
change  in  r.  That  is,  abstracted  features  tested  by 
preconditions  change  at  a  rate  smaller  than  1/r. 

2.  The  effects  of  when  applied  to  state^^’^  are 

equivalent  to  the  effects  of  when  applied  to 

state^^j^:  This  is  achieved  by  expressing  operator 
effects  relative  to  some  feature  of  statei^jl^  that  is 
unlikely  to  have  changed  in  the  intervemng  time 
period  r.  For  instance,  the  effect  of  run-awaypyj^ 
predicts  L  is  headed  towards  its  home  base  —  the 
location  of  this  base  is  unlikely  to  change  within 
r.  Similarly,  the  effects  of  operators  such  as  ieam- 
nV^AfpL  expressed  in  relative  terms  as  L  turning 
to  a(±ieve  90®  angle-off,  which  is  an  angle  formed 
by  L’s  heading  relative  to  the  straight  line  joining 
D  and  L  (while  this  line  does  change  its  position  in 
r,  given  the  range  between  D  and  L  the  change  is 
small,  and  gets  absorbed  by  the  fuzz-boxes).  If  the 
effects  were  expressed  instead  as  turning  90®  from 
L’s  current  heading,  they  would  have  provided  very 
different  results  at  TO  and  TO+r,  defeating  single¬ 
state  backtracking.  Overall,  the  above  seems  possi¬ 
ble  because  operators  in  this  environment  typically 
strive  to  achieve  positions  relative  to  slowly  chang¬ 
ing  reference  points,  such  as  turning  to  a  particular 
heading  relative  to  an  opponent’s  aircraft,  or  rela¬ 
tive  to  a  wa5q)oint  such  as  the  home-base. 

3.  The  effects  of  as  applied  festate^^j^  are  elim¬ 
inated  at  some  time  TOH-r  before  they  cause  incon¬ 
sistencies  in  D ’s  response:  As  seen  in  an  example 
above,  even  though  L’s  missile  firing  was  inferred, 
and  this  inference  was  not  "cleaned  up”  upon  recog¬ 
nition  of  a  match  failure,  L’s  future  maneuver  auto¬ 
matically  nullifies  the  effect  of  that  inference.  For¬ 
tunately,  typical  operatorpj^  applications  do  not 
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commit  to  such  inferences  on  statej^j^.  For  those 
that  do  commit,  these  commitments  get  removed 
by  future  maneuvers  or  become  irrelevant. 

In  some  cases,  L  quickly  terminates  its  maneuver  ^ 
within  time  period  r,  and  initiates  a  new  one  7  at  time 
Here,  given  RESC’s  situatedness,  at  time  T^r, 
D  completely  skips  tracking  and  tracks  TDl 

stead.  Fortunately,  since  D^s  initial  attempt  is  to  apply 
worst-case  operatorsj^T  ,  there  is  at  least  the  assurance 
that  what  is  skipped  is  not  among  the  worst  of 

the  possibilities. 

4  Implementation  and  Evaluation 

To  understand  the  effectiveness  of  the  RESC  approach, 
we  have  implemented  it  as  part  of  an  experimental  vari¬ 
ant  of  TacAir-Soar[Tambe  ei  a/.,  1995J.  The  current 
TacAir-Soar  system  contains  about  2000  rules,  and  au¬ 
tomated  pilots  based  on  it  have  participated  in  combat 
exercises  with  expert  human  pilots.  Our  experimental 
version  —  created  to  investigate  the  RESC  approach  — 
contains  about  950  of  these  rules.  Proven  techniques 
from  this  experimented  version,  called  TacAir-Soar^*^^ 
are  transferred  back  into  the  original  TacAir-Soar. 

There  are  at  least  two  aspects  to  understanding  the 
effectiveness  of  TacAir-Soar^^^,  The  first  aspect  is 
whether  the  current  approach  enables  D,  the  TacAir- 
Soar^^^  pilot  agent,  to  track  its  opponents’  actions 
accurately  in  real-time.  We  conducted  two  sets  of  exper¬ 
iments  to  address  this  issue.  The  first  set  involved  run¬ 
ning  Soar-vs-Soar  air-combat  simulation  scenarios  (as 
outlined  by  the  human  experts).  The  results  from  these 
experiments  are  presented  in  Table  1. 


Sen. 

num 

Num 

oppnts 

Total 

opertra 

%  operators 
agent  trade 

%  of  colm  2 
in  matdi  fail 

1 

1 

37 

S% 

0% 

2 

1 

133 

45% 

17% 

3 

2 

167 

50% 

16% 

4 

2 

175 

64% 

17% 

5 

4 

142 

63% 

11% 

Table  1:  Results  of  Soar-vs-Soar  experiments. 

The  first  column  lists  the  scenario  number.  The  sec¬ 
ond  column  lists  the  number  of  opponents  that  D  faces 
in  each  scenario  —  this  varies  from  one  to  four  in  these 
scenarios.  The  third  column  indicates  the  total  num¬ 
ber  of  D’s  operator  executions  in  each  scenario.  This 
includes  operators  for  D’s  own  actions,  as  well  as  for 
trauJdng  opponent  actions.  The  total  number  provides 
some  indication  of  the  comparative  complexity  of  the 
different  scenarios.  Note  that  these  operators  are  not  all 
applied  in  a  sequence  at  regular  intervals;  D  often  waits 
in  between  applications  as  it  tries  to  get  into  different 
positions.  Indeed,  despite  the  differences  in  the  num¬ 
ber  of  operators,  the  total  time  per  scenario  is  about 
the  same,  approx  5  minutes.  The  fourth  column  shows 
the  percentage  of  operator  executions  involved  in  agent 
tracking.  This  percentage  clearly  depends  on  the  num¬ 
ber  of  maneuvers  that  the  opponents  execute,  and  the 
number  of  opponents.  The  key  point  here  is  that  agent 


tracking  is  a  non-trivial  task  for  D.  Furthermore,  higher 
percentages  of  operator  executions  may  be  dedicated  to 
agent  tracking  with  increased  numbers  of  opponents. 

The  fifth  column  shows  the  percentage  of  agent  track¬ 
ing  operators  involved  in  matdi  failures  (counting  oper¬ 
ators  at  the  bottom  of  the  hierarchy  that  encountered 
the  failure,  but  not  their  parents).  The  main  point  here 
is  that  the  overall  percentage  of  these  operator  is  low;  at 
most  17%  of  the  agent  tracking  operators  are  involved 
in  match  failures. 

In  all  of  these  cases,  D  is  successful  in  tracking  op¬ 
ponents  in  real-time  so  as  to  react  appropriately.  Even 
in  cases  where  D  encoimters  match  failures,  it  is  able 
to  backtrack  to  track  the  on-going  activities  in  real-time 
and  respond  appropriately.  However,  as  the  number  of 
opponents  increases,  D  does  face  resource  contention 
problems.  With  four  opponents,  it  is  unable  to  track 
the  actions  of  all  of  the  agents  in  time,  and  gets  shot 
down  (hence  fewer  operators).  This  resource  contention 
issue  is  under  active  investigationfTambe,  1995]. 

Our  second  set  of  experiments  involved  Soar-vs- 
ModSAF  simulated  air-combat  scenarios.  ModSAF- 
based[Calder  ti  a/.,  1993]  pilot  agents  are  controlled 
by  finite  state  machines  combined  with  arbitrary  pieces 
of  code,  and  do  not  exhibit  high  behavioral  flexibility. 
While  D  was  in  general  successful  in  agent  tracking  in 
these  experiments  —  it  did  recognize  the  maneuvers  in 
real-time  and  respond  to  them  —  one  interesting  issue 
did  come  up.  In  particular,  in  one  of  the  scenarios  here, 
there  was  a  substantial  mismatch  in  D’s  worst  assump¬ 
tions  regarding  its  opponent’s  missile  capabilities  and  the 
actual  capabilities  —  leading  to  tracking  failures.  Deal¬ 
ing  with  model  mismatch  is  also  an  issue  for  future  work. 

The  second  aspect  to  understanding  the  effectiveness 
of  TacAir-Soar^^^  is  some  quantitative  estimate  of  the 
impact  of  agent  tracking  on  improving  D’s  overaU  perfor¬ 
mance.  In  general,  this  is  a  difficult  issue  to  address  (see 
for  instance  the  debate  in  [Hanks  ti  al^  1993]).  Nonethe¬ 
less,  we  can  at  least  list  some  of  the  types  of  benefits 
that  D  accrues  from  this  capability.  First,  agent  track¬ 
ing  is  crucial  for  D’s  survival.  Indeed,  it  is  based  on 
agent  tracking  that  D  can  recognize  an  opponent’s  mis¬ 
sile  firing  behavior  and  evade  it.  Second,  agent  tracking 
improves  D’s  overall  understanding  of  a  situation,  so  it 
can  act/react  more  intelligently.  For  instance,  if  an  op¬ 
ponent  is  understood  to  be  running  away,  D  can  chase 
it  down,  which  would  be  inappropriate  if  the  opponent 
is  not  really  running  away.  Similarly,  if  D  is  about  to 
fire  a  missile,  and  it  recognizes  that  the  opponent  is  also 
about  to  do  the  same,  then  it  can  be  more  tolerant  of 
small  errors  in  its  own  missile  firing  position  so  that  it 
can  fire  first.  Finally,  agent  tracking  helps  D  in  pro¬ 
viding  a  better  explanation  of  its  behaviors  to  human 
experts.  (Such  an  explanation  capability  is  currently 
being  developed[Johnson,  1994]).  If  human  experts  see 
D  as  performing  its  task  with  an  inaccurate  understand¬ 
ing  of  opponents’  actions,  they  will  not  have  sufficient 
confidence  to  actually  use  it  in  training. 
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5  Lessons  Learned 

This  paper  presented  an  approach  called  RESC,  for 
agent  tracking  in  real-time  dynamic  environments.  Our 
investigation  was  based  on  a  real-world  synthetic  envi¬ 
ronment  that  has  already  been  used  in  a  large-scale  op¬ 
erational  military  exercise[Tambe  ei  a/.,  1995].  Lessons 
learned  from  this  investigation  —  as  embodied  in  RESC 
—  are  as  follows: 

•  To  track  other  agents^  flexible  and  reactive  behav¬ 
iors:  Reuse  the  architectural  mechanisms  that  sup¬ 
port  an  agent’s  own  flexible/reactive  behaviors  in 
service  of  tracking  others’  behaviors. 

•  To  address  ambiguities  in  real-time:  Quickly  com¬ 
mit  to  a  single  interpretation,  and  use  single-state 
backtracking  to  recover  from  erroneous  commit¬ 
ments. 

•  To  address  real-time  issues  in  general:  Keep  track¬ 
ing  firmly  tied  to  the  now,  i.e.,  to  the  present  state. 

One  key  issue  for  future  work  is  investigating  the  gen¬ 
erality  of  these  lessons  by  applying  RESC  to  other  com¬ 
petitive  and  collaborative  multi-agent  domains.  One 
candidate  that  has  been  suggested  is  a  real-time  multi¬ 
robot  domain  where  robots  track  other  robots  or  humans 
to  collaborate  in  a  task  by  observation  (rather  than  by 
communication) [KuniyosM  et  al,  1994].  Beyond  agent 
tracking,  there  is  some  indication  that  RESC  could  apply 
in  other  real-time  comprehension  tasks.  For  instance,  a 
RESC-type  strategy  has  been  previously  used  in  a  real¬ 
time  language  comprehension  system[Lewis,  1993].  This 
system  also  commits  to  a  single  interpretation  of  an  in¬ 
put  sentence  despite  ambiguity,  and  attempts  to  repair 
the  interpretation  in  real-time  when  faced  with  parsing 
difficulties.  We  hope  that  investigating  these  broader 
applications  will  lead  to  an  improved  understanding  of 
agent  tracking  and  comprehension. 
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1.  Abstract 

Ibe  Soar/IFOR  project  has  been  developing 
intelligent  pilot  agents  (henceforth  IPs)  for 
participation  in  simulated  battlefield  environments. 
While  previously  the  project  was  mainly  focused  on 
IPs  for  fixed-wing  aircraft  (FWA),  more  recently,  the 
project  has  also  started  develq>ing  IPs  for  rotary- 
wing  aitcraft  (RWA).  Hiis  paper  presents  a 
preliminary  rep<Ht  on  the  development  of  IPs  fts' 
RWA.  It  focuses  on  two  inqrortant  issues  that  arise  in 
diis  devel<q>ment.  The  first  is  a  requirement  for 
reasoning  about  die  terrain  —  when  compared  to  an 
FWA  IP,  an  RWA  IP  needs  to  fly  much  closo:  to  the 
terrain  and  in  general  take  advantage  of  the  terrain  for 
cover  and  concealment  The  second  issue  relates  to 
code  and  concept  sharing  between  the  FWA  and 
RWA  IPs.  While  Glaring  promises  to  cut  down  the 
development  time  for  RWA  IPs  by  taking  advantage 
of  our  previous  worir  for  the  FWA,  it  is  not 
straightforward.  The  ptqier  discusses  the  two  issues  in 
some  detail  and  presents  our  initial  resolutions  of 
these  issues. 

2,  Introduction 

The  Soar/IFOR  project  has  been  developing 
intelligent  pilot  agents  (IPs)  for  rirnnlataH  battlefield 
environments  (Laird  et  aL,  1995,  Rosmbloom,  et  aL, 
1994,  Tambe  et  aL,  1995).  Until  Summer  1994,  the 
project  was  focused  on  building  IPs  for  simulated 
fixed-wing  aircraft  (FWA),  including  air-to-air 
fighters  and  ground-attack  aircrafL  Since  July  1994, 
we  have  begun  developing  IPs  for  simulated  rotary- 
wing  aircraft  (RWA),  spedfically,  AH-64  Apache 
attack  helicopters. 

While  there  are  similarities  in  an  RWA  and  an 
FWA  pilot’s  missions — e.g.,  eiiq>loying  weapons  on 
targets,  flying  mission-specified  routes  —  there  are 
also  some  important  differences.  One  key  difference 
is  reasoning  about  the  terrain.  For  example,  an  RWA 
pilot’s  mission  can  involve  flying  Nq>-of-d>e-earth 
(NOE),  where  it  needs  to  fly  only  about  25  feet  above 
^und  level,  while  avoiding  obstacles.  It  may  also 
involve  flying  through  a  valley,  or  around  a  forested 
region.  The  mission  may  also  involve  hiding 
(masking)  behind  a  ridge,  popping  op  to  spot  enemy 
targets,  and  remasking  in  a  new  hiding  position. 
Figure  1  provides  an  illustration  of  diis  type  of  terrain 
reasoning.  It  presents  a  snapshot,  taken  firom 
ModSAFs  plan-view  display  (Calder  et  aL,  1993),  of 


a  ^ical  scenario  involving  Soar-based  RWA  IPs. 
There  are  two  RWA  in  the  scenario,  just  behind  the 
ridge,  indicated  by  the  contour  lines.  The  other 
vehicles  in  the  figure  are  a  convoy  of  "enemy” 
ground  vehicles — tanks  and  anti-aircraft  vehicles — 
controlled  by  ModSAF.  The  RWA  are 
^rproximately  2.5  miles  from  the  convoy.  The  IPs 
have  hidden  their  helicopters  behind  the  ridge  (their 
approximate  hiding  area  is  specified  to  them  in 
advance).  They  umnask  these  helicopters  by  popping 
out  from  behind  the  ridge  to  launch  missiles  at  the 
enemy  vehicles,  and  quickly  temask  (hide)  by 
dipping  behind  the  ridge  to  survive  retaliatory 
attacks.  They  subsequently  change  their  hiding 
position  to  avoid  predictability  when  they  pop  out 
later. 


Figure  1:  A  snapshot  of  ModSAFs  simulation  of  an 
air-to-ground  combat  situation. 


Thus,  the  development  of  RWA  IPs  brings  op  the 
novel  issue  of  terrain  reasoning,  not  addressed  in 
previous  work  on  Soar/EFOR  agents.  There  has  been 
much  wo±  on  terrain  reasoning  in  ModSAF  in  their 
development  of  semi-automated  forces  or  SAFs 


(Calder  et  al.,  1993).  That  work  has  so  far  primarily 
focused  on  ground-based  SAFs  (e.g.,  (Longdn, 
1994)),  although  there  is  a  recent  effort  focused  on 
terrmn  reasoning  for  RWA  (Tan,  1995).  Outside  the 
arena  of  automated  forces,  terrmn  reasoning  in  the 
form  of  route  planning  and  execution  has  been 
addressed  extensively  in  AI  and  Robotics.  The  focus 
of  much  of  this  work  is  on  2D  routes  ^nton  and 
Froeberg,  1984,  Khatib,  1986,  Lozano-Perez  and 
Wesley,  1979,  Mitchell,  1990)  —  and  this  category 
includes  some  previous  work  within  Soar  (Stobie  et 
al.,  1992)  —  although  some  efforts  have  also 
attacked  the  3D  route  planning  problem  (Bose  et  aL, 
1987,  Rao  and  Atkin,  1989).  Other  aspects  of  terrain 
reasoning  such  as  tactical  situation  assessment 
(McDermott  and  Gelsey,  1987)  and  hiding  (Stobie  et 
aL,  1992)  have  also  received  some  attention,  although 
not  neariy  as  much  as  route  planning.  As  discussed  in 
Section  3,  the  pure  route  planning  approaches  from 
this  literature  are  unlikely  to  address  the  terrain 
reasoning  challenge  fadng  the  RWA  IPs,  which  is  to 
accomplish  these  tasks  in  real-time,  given  a  realistic 
3D  terrain  database.  A  hybrid  solution  coihbining 
some  abstract  plans  with  reactivity  is  currently  being 
investigated. 

Given  the  similarities  between  the  FWA  and  RWA 
IPs,  concept  and  code  sharing  between  the  two  is  a 
real  possibility.  Sharing  would  speed  op 
development  of  RWA  IPS  by  taking  advantage  of  our 
previous  work  on  FWA.  However,  the  differences  — 
such  as  the  terrain  reasoning  c^ability  above  — 
imply  that  sharing  is  not  straightforward.  There  have 
been  some  previous  efforts  aimed  at  facilitating  reuse 
of  code  and  concepts  among  Soar  systems.  These 
effmts  have  typically  focused  on  reuse  of  individual 
c^abilides,  such  as  inductive  learning  (Rosenbloom 
and  Aasman,  1990),  or  natural  language  (Lewis, 
1993,  Rubinoff  and  Lehman,  1994)  ciq)abilities.  The 
novel  issue  here  is  that  a  large  fraction  of  the  FWA  IP 
structure  is  potentially  reusable  in  develoi^g  RWA 
IPs  and  such  reuse  needs  to  be  fadlitated. 

The  rest  of  this  pq)er  provides  more  details  on 
these  two  issues.  Section  3  focuses  on  terrain 
reasoning.  Section  4  discusses  the  issue  of  code  and 
concept  sharing  between  Soar-based  FWA  and  RWA 
IPs.  We  will  assume  some  familiarity  with  the  Soar 
architecture  (Laird,  Newell,  and  Rosenbloom,  1987, 
Rosenbloom,  et  al.,  1991). 

3,  Terrain  Reasoning 

The  overall  terrain  reasoning  tasks  for  an  RWA  IP 
may  be  subdivided  into  two  categories.  The  first  is  to 
fly  from  a  given  source  to  a  destination,  while 
abiding  by  mission  q>edfied  constraints  regarding 
the  flight  methods.  A  flight  method  primarily 
spedfies  mdntenance  of  a  certain  air-q>eed  and 
altitude  above  ground  level.  In  particular,  a 
high-level  flight  requires  that  the  RWA  fly  more  than 


200  feet  above  ground  level  with  air-speed  as  high  as 
145  knots.  A  low-level  flight  requires  that  the  RWA 
fly  100-200  feet  above  ground  level,  while 
maintaining  a  maximum  air-speed  of  100  knots.  A 
contour  flight  requires  the  RWA  to  fly  between 
25-100  feet  above  ground  level,  but  with  a  maximum 
air-speed  of  70  knots.  An  NOE  flight  requires  the 
RWA  to  fly  within  just  25  feet  above  ground  level, 
with  a  maximum  air-speed  of  40  knots.  Additionally, 
an  NOE  flight  may  require  that  an  RWA  fly  tfarou^ 
a  valley  along  a  Ullside,  or  through  a  narrow  clear 
corridor  in  a  forested  region.  The  second  category  of 
terrain  reasoning  tasks  involves  an  RWA  IP’s 
activities  once  it  successfully  follows  its  route  to  its 
battle  area,  and  possibly  engages  enemy  vehicles.  Its 
activities  in  this  area  involve  selecting  and  occupying 
good  hiding  positions  (behind  a  ridge  or  a  forested 
le^on)  and  flying  between  hiding  positions  while 
remaining  concealed  from  a  possibly  mobile  enemy. 
It  may  also  involve  reasoning  about  possible  enemy 
hiding  positions. 

For  both  categories  of  tasks,  one  key  issue  for  an 
RWA  IP  is  to  execute  them  in  the  context  of  a  large- 
scale  and  realistic  3D  terrain  database,  with  features 
such  as  rivers,  ridges,  valleys,  hills  and  forested 
regions.  A  second  key  issue  is  that  given  its 
complexity,  the  cost  of  sensing  and  processing  large 
tracts  of  the  terrain  database  is  ncm-triviaL  A  third 
related  issue  is  fliat  an  IP  has  to  exhibit  human-like 
behavior  in  performing  these  terrain  reasoning  tasks. 
Thus,  it  should  not  make  use  of  information  that  a 
human  pflot  is  unlikely  to  obtain.  For  exanq>le,  as 
with  a  human  pilot,  an  IP  should  plan  routes  using  a 
map  of  the  terrain  database  (which  possibly  may  be 
inaccurate),  rather  than  using  die  actual  terrain 
database  (which  would  always  be  100%  accurate).  A 
final  issue  is  that  an  IP  has  to  perform  its  tasks  in 
real-time.  The  following  two  subsections  illustrate 
how  these  issues  are  addressed  fin:  each  of  the  two 
types  of  tasks  above. 


3.1.  Route  Flying 

For  the  task  of  route  flying,  one  possible  ^iproach 
for  addressing  the  above  issues  would  be  to  use  one 
of  a  variety  of  padi-planning  methods  that  provides  a 
very  detailed  3D  point-to-point  route,  with  little  need 
or  fieedom  to  modify  the  given  route  (Stobie  et  al., 
1992,  Bose  et  al.,  1987,  Rao  and  Arkin,  1989,  Denton 
and  Froeberg,  1984).  One  such  qiproacb,  based  on 
weighted-region  path  planning  (Mitchell,  1990),  is  to 
conceptually  divide  a  map  of  the  terrain  into  3D  cells 
(cubes),  assign  an  appropriate  cost  to  each  cell  that 
reflects  mission-specified  constraints,  and  then  search 
for  a  minimum  cost  path  through  the  cells.  One 
advantage  of  such  an  approach  is  that  an  RWA  IP 
need  not  sense  the  terrain  database  in  any  detail,  but 
rather  just  enough  to  follow  its  plaa  In  addition,  the 
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low  sensing  overhead  would  facilitate  real-tinie  task 
performance.  However,  there  are  several  problems 
with  such  an  q>proach.  First,  given  the  coiiq)Iexity  of 
die  terrain,  this  approach  would  require  a  significant 
initial  conqiutational  effort  to  create  and  then  search 
the  cells.  Second,  it  could  be  wasteful  given  the 
realism  of  the  RWA  model  and  its  fli^t  controls  — 
it  will  not  be  possible  for  a  Soar-based  IP  to  precisely 
control  an  RWA  to  follow  such  a  detailed  route,  and 
it  will  end  up  having  to  reactively  inqirovise  the  path 
or  replan.  The  original  planner  could  potentially  take 
diese  realistic  flight  controls  into  account  when 
develt^ing  a  plan  —  so  that  no  on-line  replanning 
may  be  required  —  but  that  would  further  increase 
die  conqilexity  of  planning.  Third,  if  the  map  of  the 
terrain  is  inaccurate  or  incomplete,  the  plan  generated 
could  be  inaccurate  as  well.  Even  if  the  map  were 
completely  accurate  (or  if  the  IP  were  using  the 
terrain  database  itself  rather  than  a  m^),  diere  could 
still  be  deviations  from  the  planned  route  caused  by 
an  unexpected  encounter  with  hostile  or  friendly 
vehicles.  Thus,  an  IP  may  not  be  able  to  rely  on  just 
its  original  planned  route;  it  may  need  to  replan. 
Hnally,  human  pQots  ^ically  do  not  rely  on  such 
detailed  plans;  and  thus  in  forcing  IPs  to  follow  such 
plans,  we  are  likely  deviating  from  our  goal  of 
building  human-like  IPs. 

So  instead,  a  Soar-based  IP  follows  a  hybrid 
strategy  that  condunes  a  plan-based  and  reactive 
strategy.  In  particular,  it  relies  on  mcne  abstract  route 
plans,  that  provide  it  just  two  to  three  intermediate 
points.^  The  IP  then  executes  these  route  plans  while 
reacting  to  sensory  information  that  ei]^les  it  to 
abide  by  tiie  mission  specified  constraints.  For  ideal 
human-like  IPs,  tiiis  sensory  information  should  be 
precisely  what  a  human  pilot  would  obtain  visually 
by  looking  out  the  window.  Unfortunately,  for  an  IP, 
such  visual  processing  is  likely  to  be  extremely 
complex  and  expensive.  Therefore,  special 
inexpensive  sensors  have  been  designed  that 
jqrproximate  such  visual  processing.  One  such  senstx' 
is  the  look-ahead  altitude  sensor  or  LAS  sensor.  LAS 
is  slaved  to  tiie  parameters  supplied  by  the  IP.  The  IP 
sets  parameters  for  LAS  that  specify  a  lookahead 
range  and  orientation,  which  in  turn  specifies  a  line 
segment  of  specific  lengtii  and  orientation  originating 
fir>m  the  IP’s  current  location.-  Once  these 
parameters  are  set,  LAS  scans  the  terrain  database 
repeatedly  (in  fact,  each  time  ModSAF  schedules  the 
agent  for  execution),  and  returns  the  highest  altitude 
value  along  tiie  specified  line  segment  For  instance, 
to  fly  NOE,  an  IP  sets  LAS’s  parameters  to  a 
lookr^ead  range  of  50  meters,  and  orientation  in  the 


*At  preseat,  these  abstract  routes  are  provided  by  a  human; 
although  given  ttiat  they  are  abstract,  planning  these  routes  is 
expected  to  be  mudi  less  convex. 


direction  of  its  flight  The  pilot  reacts  to  LAS’s 
response  by  modifying  the  altitude  of  its  helicopter  to 
be  approximately  25  feet  above  the  highest  point^ 

Ihe  top  half  of  Figure  2  shows  a  pilot  agent 
making  use  of  LAS  to  fly  NOE.  The  shaded  portion 
in  the  figure  is  a  profile  of  the  terrain,  while  the 
dashed  line  is  a  profile  of  the  helicopter  flying  NOE. 
The  straight  lines  indicate  LAS’s  lookahead  range 
while  scanning  tiie  database.  The  bottom  half  of 
Hgure  2  indicates  a  longer  lookahead  range,  and 
change  in  the  flight  profile  that  that  results. 


Figare2:  Dlustradons  of  loolahead  altitude  sensor.  LAS  scans 
the  tenain  database  each  time  the  agent  is  scheduled 
for  execution  (iUustrado&s  not  from  an  actual  nm). 


The  predse  value  of  the  lookahead  range  is 
determined  to  a  large  extent  by  tiie  speed  of  the 
RWA.  In  particular,  fw  an  NOE  flight,  an  IP 
currently  flies  conservatively  at  a  speed  of  20  knots, 
^th  50  meters  lookahead,  that  gives  it  about  5 
seconds  to  change  its  altitude.  The  other  flight 
methods,  specifically  contour,  low-level  and  Mgh- 
level  flight,  require  that  the  RWA  fly  at  a  higher 
speed.  This  in  turn  requires  tiiat  the  IP  set  a  longer 
lookahead  range  to  give  itself  more  time  to  react 
Speed  is  however  not  the  only  factor  determining  the 
lookahead  range.  It  is  also  dependent  on  the  type  of 
flight  profile  desired.  For  instance,  at  its  speed  of  80 
knots,  an  IP  could  potentially  sustain  tiie  altitude 
required  for  its  low-level  flight  with  a  lookahead  of 
just  200-300  meters.  Howevo’,  tiie  fliglit  profile 
generated  follows  the  terrain  much  too  closely — it  is 
not  as  smooth  as  the  flight  profile  that  results  from  a 
human  pilot’s  low-level  flight  (at  least  as  indicated 
by  the  experts).  Therefore,  the  low-level  flight  uses  a 
much  longer  lookahead  range  of  1500  meters.  The 
high-level  flight  uses  a  lookahead  range  of  5000 
meters. 

Unfortunately,  long  lookahead  ranges  in  LAS 
could  potentially  hinder  an  IP’s  real-time 
performance.  Therefore,  to  lower  its  cost,  LAS 
san:q>les  precisely  100  points  along  the  specified  line 


^RWA  agents  In  MbdSAF  appear  to  follow  a  sindlar  tecbnique 
(Tan,  1995). 


segment  inespective  of  the  lookahead  range.  Thus, 
despite  the  variation  in  the  lookahead  range  in  Figure 
2,  LAS  will  scan  ptedsely  100  points.  This  sampling 
resolution  may  repeat  to  be  very  low,  with  the 
potential  of  missing  high  altitude  cliffs.  However, 
LAS’s  rqieated  scanning  in  effect  improves  its 
sampling  resolution.  In  particular,  since  an  RWA 
progresses  towards  its  destination  between  two  scans, 
successive  scans  saiiq>le  slightly  different  points.  In 
fact,  each  successive  scan  samples  99  points  in  the 
neighboriiood  of  the  points  from  its  previous  scan  (on 
the  same  line  segment),  and  one  new  point  This 
resolution  could  still  be  insuftident  for  some  Qpes 
terrain.  For  instance,  if  the  terrain  is  an  urban 
landscape  with  a  q>arse  population  of  pin-shq)ed 
bigb-altitude  structures,^  there  is  a  small  possibility 
that  LAS  may  miss  those  in  its  scanning.  In  such 
cases,  there  may  be  a  need  to  increase  the  sampling 
resolution.  However,  die  1(X)  point  scans  have  so  far 
proved  adequate  over  the  terrain  database  used  in  our 
experiments  (the  RWA  have  not  crashed). 

Figure  3  presents  a  flight  profile  from  an  actual  run 
of  a  Soar-based  RWA  using  the  contour  flight 
method.  Figure  4  presents  a  flight  profile  from 
another  run  of  a  Soar-based  RWA  over 
approximately  the  same  terrain,  but  using  the  NOE 
flight  method.  The  shaded  portion  indicates  the 
terrain,  while  the  dashed  line  indicates  the  actual 
flight  profile.  IPs  smoothen  out  the  flight  by  using 
fuzz-boxes  (McDermott  and  Davis,  1984)  to  avoid 
excessive  altitude  adjustments. 
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Figures:  Illustration  of  a  contour  flight 
from  an  actual  run. 

Similar  low<ost,  LAS-type  sensors  approximating 
a  human  pilot’s  visual  input  are  currently  being 
designed  to  enable  the  RWA  pilots  to  fly  through 
valleys. 


^  A  dock  tower  would  be  one  exacq>le  of  such  a  sliuctute. 


Figure  4:  Illustration  of  an  NOE  flight 
from  an  actual  run. 


32.  Hiding 

Once  an  RWA  IP  reaches  its  mission-specified 
battle  area,  it  needs  to  engage  in  hiding-related  tasks. 
In  general,  a  battle  area  could  be  of  an  arbitrary 
(convex)  shape,  or  specified  in  terms  of  landmarks, 
such  as  trees  or  rocks.  The  IP  should  be  capable  of 
locating  good  hiding  positions  within  this  area  and 
move  between  hiding  positions  while  remaining 
concealed  from  its  enemy.  This  second  terrain 
reasoning  capability,  at  least  at  this  level  of 
generality,  is  very  much  an  issue  for  future  researdt 
At  present,  we  have  restricted  the  battle  area  to  be  a 
rectangle.  One  side  of  this  rectangular  area,  typically 
coinciding  with  a  ridge  or  a  tree  line,  is  a  mission 
specified  line  segment  This  is  in  essence  conridered 
to  be  an  imaginary  wall,  and  any  movement  behind  it 
is  assumed  to  be  hidden  from  the  enemy.  An  RWA 
IP  bides  in  a  small  rectangular  area  (defined  with  a 
width  of  100  meters)  behind  this  imaginary  walk 
When  relocating  to  a  new  hiding  position,  it  uses  the 
NOE  flight  method  to  remain  at  a  low  dtitude  and 
thus  hidden  behitxl  tiie  wall.  The  tq){ffoximations  of  a 
wall  and  a  rectangular  area  for  hiding  are  both  based 
on  our  previous  woric  in  the  groundworld  domain. 
Groundworld  involved  a  simulated  terrain  with 
random  configurations  of  horizontal  and  vertical 
walls,  where  an  intelligent  agent  had  to  hide  behind  a 
wall  to  escqje  from  another  agent  pursuing  it  (Stobie 
et  al.,  1992,  Tambe  and  Rosenbloom,  1993). 

4.  Sharing  and  Reuse 
RWA  pilots’  ttrissions  have  some  requirements  — 
such  as,  identifying  enemy  vehicles,  firing  missiles  at 
target  vehicles  and  flying  in  formation — in  common 
with  those  of  FWA  pilots.  These  commonalities  rnay 
be  exploited  to  cut  down  development  time  by 
sharing  or  reusing  both  code  and  concepts  from  Soar- 
based  FWA  pilots  in  the  development  of  RWA  {ulots. 
For  instance,  for  an  FWA  IP,  the  code  for  firing  a 
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missile  involves  three  operators  that  orient  its  aircraft 
towards  its  target,  then  push  a  fire  button  to  actually 
launch  the  missile,  and  ^n  guide  the  missile  (should 
the  missile  require  guidance)  via  radar  (or  other) 
illumination  of  the  target  These  three  operators  can 
be  reused  in  an  RWA  P.  At  present  a  Soar-based 
RWA  P  has  44  operators,  with  25  (that  is  about 
57%)  reused  in  some  form  from  the  Soar-based  FWA 
Ps.  The  19  new  operators  are  those  involved  with 
terrain  reasoning  tasks  such  as  flying  NOE,  masking 
and  unmasking.  This  sharing  is  acconq)lished  sinq>ly 
by  loading  in  q>propriate  operators  from  an  FWA  P 
code  in  an  RWA  P. 

Differences  in  concepts  and  terminology,  however, 
make  some  of  the  sharing  problematic.  For  exaiiq>le, 
for  FWA  pilots  engaged  in  air-to-air  missions,  the 
concept  of  launcb-acceptability-region  or  LAR  of  a 
missUe  combines  both  the  range  to  a  target  and  the 
target  aspect  (angle  between  the  target’s  current 
heading  and  die  straight  line  joining  the  target  and  the 
FWA  pilot’s  current  locations).  Thus,  if  a  target  is 
heading  towards  the  FWA  pilot  wiA  a  0**  target 
aspect,  the  missile  may  be  fired  from  a  long  range; 
but  the  range  is  reduced  substantiaUy  if  the  target  has 
a  180°  target  aspect  In  contrast  for  an  RWA  pilot 
the  target  aspect  is  irrelevant  in  calculating  a 
missile’s  LAR  —  the  missile  may  be  fired  at  an 
equally  long  range  irrespective  of  the  target  aspect 
This  creates  a  significant  difference  in  the  concept  of 
a  missile  LAR  for  an  FWA  and  an  RWA  P,  making 
die  sharing  of  missile-LAR-telated  code  difficult 
There  is  an  accompanying  difference  in  the 
terminology  as  well  —  the  RWA  pilot  refers  to  the 
missile  LAR  as  a  tnissQe  constraint 

At  least  some  of  these  rqiparent  discrepancies  in 
the  two  P’s  coDcqits  —  and  potentially  their 
terminology — could  be  resolved  if  the  agents  reason 
about  the  concepts  fiom  first  principles.  For  instance, 
a^nts  could  calculate  a  missile’s  LAR  from  first 
principles,  based  on  the  relative  velocities  (speed  and 
direction)  of  the  missile  and  the  target  Since  an 
FWA  P’s  target  in  air-to-air  combat  is  a  fighter  jet 
moving  at  a  speed  diat  may  be  only  a  half  to  a  fifth  its 
missile  qieed,  its  angle  of  movement  (target  aspect) 
becomes  an  inqiottant  factor  in  calculating  LAR.  In 
particular,  a  target  moving  towards  the  FWA  allows  a 
missile  to  be  fired  from  a  mudi  longer  range;  while  a 
target  that  is  moving  away  requires  that  the  missile  be 
fired  fmm  a  much  closer  range,  so  that  the  missile 
may  catch  up  with  die  target  before  expending  all  its 
fuel.  In  contrast,  an  RWA  P’s  target  is  moving  two 
orders  of  magnitude  slower  than  its  missile  —  the 
angle  of  the  target’s  movement  has  a  negligible 
impact  on  the  missile  range.  In  other  words,  with  the 
first  principles  calculations,  the  target  aspect 
discrepancy  automatically  dis^ipears.  It  will  appear 
important  in  FWA  P’s  calculations,  and  negligible  in 
an  RWA  P’s  calculations. 


While  such  calculations  from  first  principles  would 
facilitate  sharing,  the  calculations  themselves  may  be 
prohibitively  expensive,  and  hinder  real-time 
performance.  Soar’s  chunking  (learning),  could 
potentially  coirq)ile  such  first  principles  calculations 
into  new  rules  and  alleviate  this  cost  However,  diat 
remains  an  issue  for  future  work.  We  are  currendy 
relying  on  a  lower  cost  alternative,  where  a 
problematic  aspect  of  the  agent  code  is  rewritten 
when  in  reuse. 

5.  Current  Status  and  Future  Work 

As  of  February  1995,  the  RWA  agents  are  capable 
of  performing  a  complete  attrit  mission,  which 
involves  flying  to  a  battle  area  using  one  of  the 
possible  flight  methods,  followed  by  masking, 
unmasking,  firing  missiles  at  targets,  and  relocating 
to  a  different  masking  location  between  missile 
firings.  We  have  run  scenarios  with  up  to  four  RWA 
IPs  executing  die  attrit  mission. 

At  present  die  RWA  IPs  can  fly  in  coordination,  in 
pairs.  Extending  diis  work  to  enable  cocHdinated 
mission  execution  involving  a  platoon  or  a  con^iany 
of  RWA  agents  (with  a  platoon  and  a  conqiany 
commander),  is  at  the  top  of  our  agenda  for  future 
work.  Agents  at  higher  echelons  of  command,  such 
as  a  conqiany  commander,  may  also  bring  up  issues 
of  communication  and  mission  planning,  wUcb  we 
have  currendy  not  addressed.  Other  issues  for  future 
work,  mentioned  in  previous  sections,  include 
improvement  in  terrain  reasoning  for  hiding,  and  in 
co^concept  sharing  among  Soar  agents. 
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